Gamma-Poisson Distribution Model for Text Categorization
Author(s) -
Hiroshi Ogura,
Hiromi Amano,
Masato Kondo
Publication year - 2013
Publication title -
isrn artificial intelligence
Language(s) - English
Resource type - Journals
eISSN - 2090-7443
pISSN - 2090-7435
DOI - 10.1155/2013/829630
Subject(s) - multinomial distribution , computer science , poisson distribution , dirichlet distribution , normalization (sociology) , categorization , artificial intelligence , support vector machine , classifier (uml) , text categorization , zipf's law , data mining , machine learning , natural language processing , statistics , mathematics , mathematical analysis , sociology , anthropology , boundary value problem
We introduce a new model for describing word frequency distributions in documents for automatic text classification tasks. In the model, the gamma-Poisson probability distribution is used to achieve better text modeling. The framework of the modeling and its application to text categorization are demonstrated with practical techniques for parameter estimation and vector normalization. To investigate the efficiency of our model, text categorization experiments were performed on 20 Newsgroups, Reuters-21578, Industry Sector, and TechTC-100 datasets. The results show that the model allows performance comparable to that of the support vector machine and clearly exceeding that of the multinomial model and the Dirichlet-multinomial model. The time complexity of the proposed classifier and its advantage in practical applications are also discussed.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom