
Iterative machine learning applied to annotation of text datasets
Author(s) -
Thiago Abdo,
Fabiano Silva
Publication year - 2021
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5753/eniac.2021.18268
Subject(s) - computer science , artificial intelligence , machine learning , deep learning , online machine learning , computational learning theory , support vector machine , active learning (machine learning) , annotation , instance based learning
The purpose of this paper is to analyze the use of different machine learning approaches and algorithms to be integrated as an automated assistance on a tool to aid the creation of new annotated datasets. We evaluate how they scale in an environment without dedicated machine learning hardware. In particular, we study the impact over a dataset with few examples and one that is being constructed. We experiment using deep learning algorithms (Bert) and classical learning algorithms with a lower computational cost (W2V and Glove combined with RF and SVM). Our experiments show that deep learning algorithms have a performance advantage over classical techniques. However, deep learning algorithms have a high computational cost, making them inadequate to an environment with reduced hardware resources. Simulations using Active and Iterative machine learning techniques to assist the creation of new datasets are conducted. For these simulations, we use the classical learning algorithms because of their computational cost. The knowledge gathered with our experimental evaluation aims to support the creation of a tool for building new text datasets.