Open Access
Improved TF-IDF for We Media Article Keywords Extraction
Author(s) -
XinXin Guan,
Yeli Li,
Hechen Gong
Publication year - 2019
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1302/3/032003
Subject(s) - tf–idf , computer science , keyword extraction , python (programming language) , recall rate , sentiment analysis , precision and recall , artificial intelligence , word (group theory) , recall , natural language processing , data mining , information retrieval , mathematics , linguistics , philosophy , physics , geometry , quantum mechanics , term (time) , operating system
Keyword extraction is one of the work of computer text topic mining, and it is also the basis of text analysis and public opinion analysis. The keywords extracted by the traditional TF-IDF algorithm are mainly calculated based on the word frequency. The importance of other feature words with fewer occurrences and the comments of readers below the article are not considered. Aiming at the above problems, this paper improves the traditional TF-IDF algorithm, adds the part of speech and the reader’s comment as the impact factor, and recalculates the weight of TF-IDF, so that the accuracy of the algorithm is improved. This paper uses the Python language programming to crawl from the media article and implement the improvement of the algorithm. Experiments show that the improved TF-IDF algorithm has significantly improved compared with the traditional TF-IDF, in terms of accuracy, recall rate, F1, MacAvg_P, MacAvg_R and MacAvg_F1.