
Hierarchical text classification using Relative Inverse Document Frequency
Author(s) -
Boonthida Chiraratanasopha,
Thanaruk Theeramunkong,
Salin Boonbrahm
Publication year - 2021
Publication title -
ecti transactions on computer and information technology
Language(s) - English
Resource type - Journals
ISSN - 2286-9131
DOI - 10.37936/ecti-cit.2021152.240515
Subject(s) - weighting , centroid , hierarchical clustering , computer science , classifier (uml) , artificial intelligence , term (time) , hierarchical database model , document classification , tree (set theory) , natural language processing , pattern recognition (psychology) , data mining , mathematics , cluster analysis , medicine , mathematical analysis , physics , quantum mechanics , radiology
Automatic hierarchical text classification has been a challenging and in-needed task with an increasing of hierarchical taxonomy from the booming of knowledge organization. The hierarchical structure identifies the relationships of dependence between different categories in which can be overlapped of generalized and specific concepts within the tree. This paper presents the use of frequency of the occurring term in related categories among the hierarchical tree to help in document classification. The four extended term weighting of Relative Inverse Document Frequency (IDFr) including its located category, its parent category, its sibling categories and its child categories are exploited to generate a classifier model using centroid-based technique. From the experiment on hierarchical text classification of Thai documents, the IDFr achieved the best accuracy and F-measure as 53.65% and 50.80% in Top-n features set from family-based evaluation in which are higher than TF-IDF for 2.35% and 1.15% in the same settings, respectively.