
Applying Data Augmentation for Disambiguating Author Names
Author(s) -
Luciano V. B. Espiridião,
Laura Lima Dias,
Anderson A. Ferreira
Publication year - 2021
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5753/sbbd.2021.17870
Subject(s) - computer science , task (project management) , ambiguity , set (abstract data type) , information retrieval , function (biology) , digital library , training set , natural language processing , quality (philosophy) , data set , compromise , artificial intelligence , linguistics , social science , philosophy , management , poetry , epistemology , evolutionary biology , sociology , economics , biology , programming language
Author name ambiguity is one of the most challenging issues that can compromise the information quality in a scholarly digital library. For years, researchers have been searched for solutions to solve such a problem. Despite the many methods already proposed, the question remains open. In this study, we address the issue of producing a more accurate disambiguation function by means of applying data augmentation in the set of data training. We also propose a SyGAR-based data augmentation approach and evaluate our proposal on three collections commonly used in works about author name disambiguation task. The experimental results showed scenarios where improvements are possible in the author name disambiguation task. The proposal of data augmentation outperforms other data augmentation approach, as well as improves some machine learning techniques that were not specifically designed for the author name disambiguation task.