
Extracting Parallel Sentences from Low-Resource Language Pairs with Minimal Supervision
Author(s) -
Xiayang Shi,
Xinyi Liu,
Zhenqiang Yu,
Pei Tao Cheng,
Xu Chun
Publication year - 2022
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/2171/1/012044
Subject(s) - computer science , machine translation , sentence , natural language processing , artificial intelligence , parallel corpora , classifier (uml) , word (group theory) , translation (biology) , linguistics , philosophy , biochemistry , chemistry , messenger rna , gene
At present, machine translation in the market depends on parallel sentence corpus, and the number of parallel sentences will affect the performance of machine translation, especially in low resource corpus. In recent years, the use of non parallel corpora to learn cross language word representation as low resources and less supervision to obtain bilingual sentence pairs provides a new idea. In this paper, we propose a new method. First, we create cross domain mappings in a small number of single languages. Then a classifier is constructed to extract bilingual parallel sentence pairs. Finally, we prove the effectiveness of our method in Uygur Chinese low resource language by using machine translation, and achieve good results.