Topic-sensitive multi-document summarization algorithm
Author(s) -
Liu Na,
Tang Di,
Ying Lü,
Xiaojun Tang,
Haiwen Wang
Publication year - 2015
Publication title -
computer science and information systems
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.244
H-Index - 24
eISSN - 2406-1018
pISSN - 1820-0214
DOI - 10.2298/csis140815060n
Subject(s) - automatic summarization , computer science , latent dirichlet allocation , topic model , sentence , natural language processing , artificial intelligence , domain (mathematical analysis) , algorithm , term (time) , information retrieval , mathematics , mathematical analysis , physics , quantum mechanics
Latent Dirichlet Allocation (LDA), has been recently used to automatically generate text corpora topics, and applied to sentences extraction based multi-document summarization algorithms. However, not all the estimated topics are of equal importance or correspond to genuine themes of the domain. Some of the topics can be a collection of irrelevant or background words, or represent insignificant themes. This paper proposed a topic-sensitive algorithm for multi-document summarization. Our approach is distinguished from existing approaches in that we use LDA model to identify and distinguish significance topic which is used in sentence weight calculation. Moreover, beside topic characteristics, this approach also considered some statistics characteristics, such as term frequency, sentence position, sentence length, etc. This approach not only highlights the advantages of statistics characteristics, but also cooperated with LDA topic model. The experiments showed that the proposed algorithm achieved better performance compared the other state-of-the-art algorithms on DUC2002 corpus.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom