Latent Dirichlet allocation-based temporal summarization
International Journal of Web Information Systems
ISSN: 1744-0084
Article publication date: 21 November 2018
Issue publication date: 7 March 2019
Abstract
Purpose
During crises such as accidents or disasters, an enormous volume of information is generated on the Web. Both people and decision-makers often need to identify relevant and timely content that can help in understanding what happens and take right decisions, as soon it appears online. However, relevant content can be disseminated in document streams. The available information can also contain redundant content published by different sources. Therefore, the need of automatic construction of summaries that aggregate important, non-redundant and non-outdated pieces of information is becoming critical.
Design/methodology/approach
The aim of this paper is to present a new temporal summarization approach based on a popular topic model in the information retrieval field, the Latent Dirichlet Allocation. The approach consists of filtering documents over streams, extracting relevant parts of information and then using topic modeling to reveal their underlying aspects to extract the most relevant and novel pieces of information to be added to the summary.
Findings
The performance evaluation of the proposed temporal summarization approach based on Latent Dirichlet Allocation, performed on the TREC Temporal Summarization 2014 framework, clearly demonstrates its effectiveness to provide short and precise summaries of events.
Originality/value
Unlike most of the state of the art approaches, the proposed method determines the importance of the pieces of information to be added to the summaries solely relying on their representation in the topic space provided by Latent Dirichlet Allocation, without the use of any external source of evidence.
Keywords
Acknowledgements
The authors offer their sincerest gratitude to M. Boughanem for his help, patience and precious advice. They would also like to thank the members of the IRIS team at IRIT as well as the OSIRIM staff for their hospitality and support during their stay in their laboratory. In addition, the authors thank the Algerian Ministry of Higher Education and Scientific Research for financial support for seven months under the PNE fellowship program. The experiments presented in this paper were carried out using the OSIRIM platform that is administered by IRIT and supported by CNRS, the Region Midi-Pyrénées, the French Government and ERDF (see http://osirim.irit.fr/site/en).
Citation
Tazibt, A.A. and Aoughlis, F. (2019), "Latent Dirichlet allocation-based temporal summarization", International Journal of Web Information Systems, Vol. 15 No. 1, pp. 83-102. https://doi.org/10.1108/IJWIS-04-2018-0023
Publisher
:Emerald Publishing Limited
Copyright © 2018, Emerald Publishing Limited