Term Weight Algorithm Oriented Terms: Low Frequency Rather Than Little Occurrences
Author(s) -
Yiyi He,
Tiejun Li,
Yuhong Huang,
Shijie Li,
Yanhuang Jiang
Publication year - 2020
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2020.09.079
Subject(s) - term (time) , computer science , logarithm , tf–idf , algorithm , feature (linguistics) , cover (algebra) , homogeneous , inverse , frequency , data mining , pattern recognition (psychology) , artificial intelligence , statistics , mathematics , engineering , mechanical engineering , mathematical analysis , linguistics , philosophy , physics , geometry , quantum mechanics , combinatorics
Term weight algorithms based on inverse document analysis are widely used in the expression of characteristic information for text. According to the finding that frequently occurring terms always cover less feature information for the text, the terms with lower frequency will be endowed higher weight. However, the terms with little occurrences always display unimportant information or even error information, such as rare terms and misspelled terms. To tackle such a problem, this paper proposed a novel term weight algorithm that focuses on the terms with low frequency rather than little occurrences. With the statistics based on non-homogeneous compression of term frequency, the action of terms with concerned frequency will be highlighted. And logarithmic function combined with the number of terms with the same frequency is utilized to weight the terms with different frequency based on different compression intervals. Comparing with TF-IDF and SIF, the proposed approach has a similar performance with SIF and a little better than TF-IDF. According to the difference among such methods, a finding shows that the term with a low frequency rather than little occurrences may dominate the feature information of the text.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom