A New Text Mining Approach Based on HMM-SVM for Web News Classification
Author(s) -
G. Krishnalal,
S Babu Rengarajan,
K. G. Srinivasagan
Publication year - 2010
Publication title -
international journal of computer applications
Language(s) - English
Resource type - Journals
ISSN - 0975-8887
DOI - 10.5120/395-589
Subject(s) - computer science , support vector machine , hidden markov model , artificial intelligence , information retrieval , world wide web
Since the emergence of WWW, it is essential to handle a very large amount of electronic data of which the majority is in the form of text. This scenario can be effectively handled by various Data Mining techniques. This paper proposes an intelligent system for online news classification based on Hidden Markov Model (HMM) and Support Vector Machine (SVM). An intelligent system is designed to extract the keywords from the online news paper content and classify it according to the pre defined categories. Three different stages are designed to classify the content of online newspapers such as (1) Text pre-processing (2) HMM based Feature Extraction and (3) Classification using SVM. Data have been collected for experimentation from The Hindu, The New Indian Express, Times of India, Business Line, and The Economic Times. The experimental results are based on the news categories such as sports, finance and politics and their accuracies in percentage are 92.45, 96.34 and 90.76 respectively. These results are very good compared to that of other text classification methods.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom