
Classification and analysis of literary works based on distribution weighted term frequency-inverse document frequency
Author(s) -
Wei Dai
Publication year - 2021
Publication title -
journal of physics: conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
ISSN - 1742-6588
DOI - 10.1088/1742-6596/1941/1/012018
Subject(s) - tf–idf , weighting , computer science , term (time) , word lists by frequency , vector space model , inverse , artificial intelligence , word (group theory) , information retrieval , pattern recognition (psychology) , data mining , mathematics , medicine , physics , geometry , quantum mechanics , sentence , radiology
Term Frequency-Inverse Document Frequency (TF-IDF) is a commonly used data mining weighting technology for information retrieval. It can evaluate the importance of a word to a text and is widely used in Internet search engines. In order to improve the text analysis ability of different text types, a variety of weighting algorithms of TF-IDF have been developed. For the word analysis of literary works of various genres, this paper adopts Distribution Weighted Term Frequency-Inverse Document Frequency (TF-IDF-DW), which takes the distribution of feature items within and between classes into account, and can get better screening results. In the text classification part, via the comprehensive weight value obtained by TF-IDF-DW, and the classification results are obtained by class center vector algorithm and Bayesian algorithm. Finally, the classification performance of TF-IDF-DW algorithm is evaluated by comparison with the traditional TF-IDF method.