Enriching datasets for sentiment analysis in tweets with instance selection | Zendy

Eliseu Guimarães | Zendy; Daniela Vianna | Zendy; Aline Paes | Zendy; Alexandre Plastino | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Enriching datasets for sentiment analysis in tweets with instance selection

Author(s) -

Eliseu Guimarães,

Daniela Vianna,

Aline Paes,

Alexandre Plastino

Publication year - 2021

Language(s) - English

Resource type - Conference proceedings

DOI - 10.5753/kdmile.2021.17463

Subject(s) - computer science , leverage (statistics) , classifier (uml) , popularity , artificial intelligence , machine learning , sentiment analysis , data mining , field (mathematics) , selection (genetic algorithm) , set (abstract data type) , training set , information retrieval , psychology , social psychology , mathematics , pure mathematics , programming language

Sentiment analysis in tweets is a research field of great importance, mainly due to the popularity of Twitter. However, collecting and annotating tweets is an expensive and time-consuming task, making that some domains have only a limited set of labeled data. A promising strategy to handle this issue is to leverage labeled domains rich in data to select instances that enrich target datasets. This paper proposes different strategies for selecting instances from a set of labeled source datasets in order to improve the performance of classifiers trained only with the target dataset. Different approaches are proposed, including similarity metrics and variations in the number of selected instances. The results show that the size of the training set plays an essential role in the predictive capacity of the classifier. Furthermore, the results point out the importance of taking into account diversity criteria when selecting the instances.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research