
Automated News Classification using N-gram Model and Key Features of Nepali Language
Author(s) -
Dinesh Dangol,
Rupesh Dahi Shrestha,
Arun Kumar Timalsina
Publication year - 2018
Publication title -
scitech nepal
Language(s) - English
Resource type - Journals
ISSN - 2091-1742
DOI - 10.3126/scitech.v13i1.23504
Subject(s) - nepali , computer science , key (lock) , natural language processing , n gram , focus (optics) , artificial intelligence , task (project management) , language model , information retrieval , linguistics , philosophy , physics , computer security , management , optics , economics
With an increasing trend of publishing news online on website, automatic text processing becomes more and more important. Automatic text classification has been a focus of many researchers in different languages for decades. There is a huge amount of research repository on features of English language and their uses on automated text processing. This research implements Nepali language key features for automatic text classification of Nepali news. In particular, the study on impact of Nepali language based features, which are extremely different than English language is more challenging because of the higher level of complexity to be resolved. The research experiment using vector space model, n-gram model and key feature based processing specific to Nepali language shows promising result compared to bag-of-words model for the task of automated Nepali news classification.