Multi-Class Text Classification of Uzbek News Articles using Machine Learning | Zendy

Ilyos Rabbimov | Zendy; Sami Kobilov | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Multi-Class Text Classification of Uzbek News Articles using Machine Learning

Author(s) -

Ilyos Rabbimov,

Sami Kobilov

Publication year - 2020

Publication title -

journal of physics. conference series

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.21

H-Index - 85

eISSN - 1742-6596

pISSN - 1742-6588

DOI - 10.1088/1742-6596/1546/1/012097

Subject(s) - computer science , artificial intelligence , naive bayes classifier , support vector machine , machine learning , random forest , uzbek , natural language processing , multinomial logistic regression , class (philosophy) , decision tree , classifier (uml) , information retrieval , philosophy , linguistics

A large amount of online news on various topics is being posted on the Internet. One of the tasks of processing this data is to provide the user with appropriate methods and tools for quick and easy search for important and interesting news. An approach to solve this problem is the reasonable distribution of news into respective classes. This increases the importance of automated classification of an electronic document section. In this paper, we consider the task of multi-class text classification for the texts written in Uzbek. The articles on ten categories were selected from the Uzbek “Daryo” online news edition and a dataset was developed for them. When performing multi-class text classification for this dataset, the following 6 different machine learning algorithms were used: Support Vector Machines (SVM), Decision Tree Classifier (DTC), Random Forest (RF), Logistic Regression (LR) and Multinomial Naïve Bayes (MNB). A detailed technological description of the stages of the proposed functional scheme of text classification and developed software is given. The TF-IDF algorithm and word-level and character-level n-gram models were used as the feature extraction methods. When defining hyperparameters for text classification, 5-fold cross-validation was used. Experiments were conducted and the highest accuracy was 86.88%. The models and methods that are proposed in this paper can be successfully used in the classification of texts written in the Uzbek language and further research in this area.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore