Category Classification and Topic Discovery of Japanese and English News Articles
Author(s) -
David B. Bracewell,
Jiajun Yan,
Fuji Ren,
Shingo Kuroiwa
Publication year - 2009
Publication title -
electronic notes in theoretical computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.242
H-Index - 60
ISSN - 1571-0661
DOI - 10.1016/j.entcs.2008.12.066
Subject(s) - computer science , information retrieval , statistical classification , document classification , precision and recall , natural language processing , library classification , artificial intelligence , world wide web
This paper presents algorithms for topic analysis of news articles. Topic analysis entails category classification and topic discovery and classification. Dealing with news has special requirements that standard classification approaches typically cannot handle. The algorithms proposed in this paper are able to do online training for both category and topic classification as well as discover new topics as they arise. Both algorithms are based on a keyword extraction algorithm that is applicable to any language that has basic morphological analysis tools. As such, both the category classification and topic discovery and classification algorithms can be easily used by multiple languages. Through experimentation the algorithms are shown to have high precision and recall in tests on English and Japanese
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom