z-logo
Premium
Topic extraction using local graph centrality and semantic similarity
Author(s) -
Rajangam Engels,
Annamalai Chitra
Publication year - 2018
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.5054
Subject(s) - automatic summarization , computer science , latent dirichlet allocation , centrality , artificial intelligence , natural language processing , semantic similarity , latent semantic analysis , graph , similarity (geometry) , keyword extraction , topic model , information retrieval , mathematics , image (mathematics) , statistics , theoretical computer science
Summary Topic extraction is a challenging task under Natural Language Processing and Text Mining. Topic extraction is useful in natural language processing tasks such as automated summarization, question answering, and personalized search. In this paper, we propose an unsupervised topic extraction method using semantic similarity, keyword significance, and graph centrality. First, we select semantically similar words from text documents. Next, we perform disambiguation to find the correct senses of selected words. Then, we build a weighted graph using semantic relationships and significance of words in the text. Finally, we identify topic keywords using a novel concurrent local weighted centrality (LWWC) from words represented as nodes in a graph. Using standard annotated CiteULike and standard Brown datasets, we evaluated the results with precision, recall, and F ‐measure. We show that the proposed method yields results comparable with the state of the art LDA (Latent Dirichlet Allocation) and S‐LDA (Sparse‐LDA) topic extraction techniques. We also show that the proposed concurrent LWWC algorithm is more effective than the existing generic centrality measures in networks of words. We verified the statistical significance of improved effectiveness of our approach, using one‐way analysis of variance (ANOVA) and Tukey‐Honest Significant Difference (Tukey‐HSD) post‐hoc methods.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here