
Analyzing impact of number of features on efficiency of hybrid model of lexicon and stack based ensemble classifier for twitter sentiment analysis using WEKA tool
Author(s) -
Sangeeta Rani,
Nasib Singh Gill,
Preeti Gulia
Publication year - 2021
Publication title -
indonesian journal of electrical engineering and computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.241
H-Index - 17
eISSN - 2502-4760
pISSN - 2502-4752
DOI - 10.11591/ijeecs.v22.i2.pp1041-1051
Subject(s) - lexicon , computer science , classifier (uml) , artificial intelligence , feature selection , feature (linguistics) , sentiment analysis , machine learning , data mining , pattern recognition (psychology) , philosophy , linguistics
Twitter is used by millions of people across the world, so the data collected from Twitter can be highly valuable for research and helpful in decision support. Here in this paper ‘Twitter US Airline data’ from Kaggle data repository is used for sentiment classification of customers’ reviews. The current research aims to implement various machine learning classifiers, Stack-based ensemble classifiers and hybrid of lexicon classifier with other classifiers. 11 different classification models are implemented for different sized feature sets. Also, all the 11 models are re-implemented by adding sentiment score of lexicon based classifier as one of the features in the feature set. Results are analyzed by varying number of input feature variables used in the classification. Four different size feature sets having 301,501, 701, and 1301 number of features are used to analyze the variations in the final findings. Chi-Square and Information gain techniques are used for feature selection. The results show that an increase in the number of features increases the accuracy up to 701 features. After that, accuracy is stable or decreases with increase in feature set size. Also, the cost of adding sentiment score of lexicon classifier to the input feature set is nominal, but the results are improved consistently. WEKA and R Studio tools are used for analysis and implementation. Accuracy and Kappa are used for representing and comparing the efficiency of models.