z-logo
open-access-imgOpen Access
Multi-Class Text Classification of Uzbek News Articles using Machine Learning
Author(s) -
Ilyos Rabbimov,
Sami Kobilov
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1546/1/012097
Subject(s) - computer science , artificial intelligence , naive bayes classifier , support vector machine , machine learning , random forest , uzbek , natural language processing , multinomial logistic regression , class (philosophy) , decision tree , classifier (uml) , information retrieval , philosophy , linguistics
A large amount of online news on various topics is being posted on the Internet. One of the tasks of processing this data is to provide the user with appropriate methods and tools for quick and easy search for important and interesting news. An approach to solve this problem is the reasonable distribution of news into respective classes. This increases the importance of automated classification of an electronic document section. In this paper, we consider the task of multi-class text classification for the texts written in Uzbek. The articles on ten categories were selected from the Uzbek “Daryo” online news edition and a dataset was developed for them. When performing multi-class text classification for this dataset, the following 6 different machine learning algorithms were used: Support Vector Machines (SVM), Decision Tree Classifier (DTC), Random Forest (RF), Logistic Regression (LR) and Multinomial Naïve Bayes (MNB). A detailed technological description of the stages of the proposed functional scheme of text classification and developed software is given. The TF-IDF algorithm and word-level and character-level n-gram models were used as the feature extraction methods. When defining hyperparameters for text classification, 5-fold cross-validation was used. Experiments were conducted and the highest accuracy was 86.88%. The models and methods that are proposed in this paper can be successfully used in the classification of texts written in the Uzbek language and further research in this area.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here