z-logo
open-access-imgOpen Access
Analysis of a Brazilian Indigenous corpus using machine learning methods
Author(s) -
Tiago Barbosa de Lima,
André Nascimento,
Péricles Miranda,
Rafael Ferreira Mello
Publication year - 2021
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5753/eniac.2021.18246
Subject(s) - indigenous , computer science , identification (biology) , artificial intelligence , natural language processing , documentation , key (lock) , machine translation , ecology , botany , computer security , biology , programming language
In Brazil, several minority languages suffer a serious risk of extinction. The appropriate documentation of such languages is a fundamental step to avoid that. However, for some of those languages, only a small amount of text corpora is digitally accessible. Meanwhile there are many issues related to the identification of indigenous languages, which may help to identify key similarities among them, as well as to connect related languages and dialects. Therefore, this paper proposes to study and automatically classify 26 neglected Brazilian native languages, considering a small amount of training data, under a supervised and unsupervised setting. Our findings indicate that the use of machine learning models to the analysis of Brazilian Indigenous corpora is very promising, and we hope this work encourage more research on this topic in the next years.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here