Developing an Efficient Text Pre-Processing Method with Sparse Generative Naive Bayes for Text Mining
Author(s) -
Mrutyunjaya Panda
Publication year - 2018
Publication title -
international journal of modern education and computer science
Language(s) - English
Resource type - Journals
eISSN - 2075-017X
pISSN - 2075-0161
DOI - 10.5815/ijmecs.2018.09.02
Subject(s) - computer science , naive bayes classifier , generative grammar , bayes' theorem , artificial intelligence , generative model , machine learning , information retrieval , data mining , bayesian probability , support vector machine
With the explosive growth of internet, there are a big amount of data being collected in terms of text document, that attracts many researchers in text mining. Traditional data mining methods are found to be trapped while dealing with the scale of text data. Such large scale data can be handled by using parallel computing frameworks such as: Hadoop and MapRedue etc. However, they are also not away from challenges.On the other hand, Naive Bayes (NB) and its variant Multinomial Naive Bayes (MNB) plays an important role in text mining for their simplicity and robustness but if anything or everything from number of words, documents and labels go beyond the linear scaling, then MNB is intractable and will soon be out of memory while dealing in a single computer. Looking into the high dimensional sparse nature of the documents in text datasets, a scalable sparse generative Naive Bayes (SGNB) classifier is also proposed to develop a good text classification model. Unlike parallelization, SGNB reduces the time complexity non-linearly and hence expected to provide best results. In this paper, an efficient Lovins stemmer in combination with snowball based stopword calculation and word tokenizer is proposed for text pre-processing. The extensive experiments conducted on publicly available very well known text datasets opines the effectiveness of the proposed approach in terms of accuracy, F-score and time in comparison to many baseline methods available in the recent literature.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom