z-logo
open-access-imgOpen Access
Exploring multinomial naïve Bayes for Yorùbá text document classification
Author(s) -
Ikechukwu Ignatius Ayogu
Publication year - 2020
Publication title -
nigerian journal of technology
Language(s) - English
Resource type - Journals
eISSN - 2467-8821
pISSN - 0331-8443
DOI - 10.4314/njt.v39i2.23
Subject(s) - bigram , trigram , natural language processing , computer science , artificial intelligence , naive bayes classifier , yoruba , representation (politics) , text categorization , categorization , linguistics , support vector machine , politics , philosophy , political science , law
The recent increase in the emergence of Nigerian language text online motivates this paper in which the problem of classifying text documents written in Yorùbá language into one of a few pre-designated classes is considered. Text document classification/categorization research is well established for English language and many other languages; this is not so for Nigerian languages. This paper evaluated the performance of a multinomial Naive Bayes model learned on a research dataset consisting of 100 samples of text each from business, sporting, entertainment, technology and political domains, separately on unigram, bigram and trigram features obtained using the bag of words representation approach. Results show that the performance of the model over unigram and bigram features is comparable but significantly better than a model learned on trigram features. The results generally indicate a possibility for the practical application of NB algorithm to the classification of text documents written in Yorùbá language. Keywords: Supervised learning, text classification, Yorùbá language, text mining, BoW Representation

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here