
Assessing the Impact of Stemming Algorithms Applied to Brazilian Legislative Documents Retrieval
Author(s) -
Ellen Souza,
Gyovana Moriyama,
Douglas Vitório,
André C. P. L. F. de Carvalho,
Nádia Félix,
Hidelberg Oliveira Albuquerque,
Adriano L. I. Oliveira
Publication year - 2021
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5753/stil.2021.17802
Subject(s) - computer science , pipeline (software) , information retrieval , curse of dimensionality , field (mathematics) , process (computing) , legislature , bag of words model , artificial intelligence , natural language processing , data mining , algorithm , mathematics , programming language , archaeology , pure mathematics , history
The main purpose of stemming is to reduce the inected words into its root form or stem. Thus, words can be mapped to the same concept, improving the process of information retrieval, regarding its ability to index documents and to reduce data dimensionality. However, the efficiency of those algorithms varies according to different aspects. Also, studies in the field area reached contrasting conclusions. This work assesses the use of stemmers in the retrieval of legislative documents written in Portuguese. Four stemmers together with BM25 were evaluated in two legislative corpora from the Brazilian Chamber of Deputies. RSLP-S and Savoy stemmers showed the best improvements in the information retrieval pipeline.