z-logo
open-access-imgOpen Access
Tibetan-Chinese cross-lingual word embeddings based on MUSE
Author(s) -
Wei Ma,
Hongzhi Yu,
Kun Zhao,
Debin Zhao,
Jun Yang
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1453/1/012043
Subject(s) - computer science , natural language processing , word (group theory) , artificial intelligence , word embedding , semantics (computer science) , task (project management) , representation (politics) , embedding , linguistics , philosophy , management , politics , political science , law , economics , programming language
The idea of word embedding is based on the semantic distribution hypothesis of linguist Harris (1954), who believes that words with the same semantics are distributed in similar contexts. The learning of word embedding is a crucial technology in natural language processing. In recent years, cross-language word vectors have received more and more attention. Cross-language Word vectors can transfer knowledge between different languages. Most importantly, this transfer can occur between rich-resource and low-resource languages. This paper uses the Tibetan-Chinese Wikipedia corpus to train monolingual word vectors. Based on the Tibetan-Chinese bilingual translations, we use the supervised method in the MUSE library to train the Tibetan-Chinese bilingual cross-language word embeddings. In the experiment, we evaluate the result of word representation on the standard lexical semantic evaluation task. The results show that the method has a certain improvement in the semantic representation of the word embedding.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here