z-logo
open-access-imgOpen Access
Python NLTK Sentiment Inspectionusing Naïve Bayes Classifier
Author(s) -
Y Jeevan,
Nagendra Kumar,
B.V.S.T. Sai,
Varagiri Shailaja,
S. Renuka,
Bharathi Panduri
Publication year - 2019
Publication title -
international journal of recent technology and engineering
Language(s) - English
Resource type - Journals
ISSN - 2277-3878
DOI - 10.35940/ijrte.b1328.0982s1119
Subject(s) - naive bayes classifier , computer science , support vector machine , sentiment analysis , python (programming language) , punctuation , information retrieval , artificial intelligence , set (abstract data type) , classifier (uml) , machine learning , natural language processing , data mining , programming language , operating system
The Web is one of the richest sources for gathering of consumer reviews and opinions. There are many websites which contains opinions of the customers in the form of reviews, blogs, discussion groups, and forums. This project focuses on customer reviews on the restaurants. It predicts whether the given comment is either a positive or negative using supervised machine learning techniques. The project makes use of a dataset from Kaggle website. The dataset consists of comment and the type of comment (i.e., either positive or negative). This project makes a study on classification algorithm and text mining approaches to identify the type of comment. Firstly, the data set which is taken is made free from duplicates. That is duplicates are removed then it is followed by text pre-processing that involves removal of punctuation marks, stop word removal and then conversion of the whole text into vector format would takes place. The conversion from text to vector is an essential step because the English cannot be directly used for the analysis as we are working with linear algebra. So, as to work with this data, it has to be converted to vector format and we are using CountVectorizer to convert the data to the vector format. And finally comes the classification part. We are using Naive Bayes algorithm for this classification. This classification makes the data set into two parts as mentioned above. Here we are taking 70 percent of the data to be train data set and 30 percent of the data to be test data set

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here