Probing language identity encoded in pre-trained multilingual models: a typological view | Zendy

JianYu Zheng | Zendy; Ying Li | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Probing language identity encoded in pre-trained multilingual models: a typological view

Author(s) -

JianYu Zheng,

Ying Li

Publication year - 2022

Publication title -

peerj. computer science

Language(s) - English

Resource type - Journals

ISSN - 2376-5992

DOI - 10.7717/peerj-cs.899

Subject(s) - computer science , identity (music) , natural language processing , encoding (memory) , language model , artificial intelligence , linguistics , philosophy , physics , acoustics

Pre-trained multilingual models have been extensively used in cross-lingual information processing tasks. Existing work focuses on improving the transferring performance of pre-trained multilingual models but ignores the linguistic properties that models preserve at encoding time—“language identity”. We investigated the capability of state-of-the-art pre-trained multilingual models (mBERT, XLM, XLM-R) to preserve language identity through language typology. We explored model differences and variations in terms of languages, typological features, and internal hidden layers. We found the order of ability in preserving language identity of whole model and each of its hidden layers is: mBERT > XLM-R > XLM. Furthermore, all three models capture morphological, lexical, word order and syntactic features well, but perform poorly on nominal and verbal features. Finally, our results show that the ability of XLM-R and XLM remains stable across layers, but the ability of mBERT fluctuates severely. Our findings summarize the ability of each pre-trained multilingual model and its hidden layer to store language identity and typological features. It provides insights for later researchers in processing cross-lingual information.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore