What Do Language Representations Really Represent?
Author(s) -
Johannes Bjerva,
Robert Östling,
Maria Han Veiga,
Jörg Tiedemann,
Isabelle Augenstein
Publication year - 2019
Publication title -
computational linguistics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.314
H-Index - 98
eISSN - 1530-9312
pISSN - 0891-2017
DOI - 10.1162/coli_a_00351
Subject(s) - computer science , similarity (geometry) , natural language processing , artificial intelligence , representation (politics) , linguistics , word (group theory) , semantic similarity , machine translation , philosophy , politics , political science , law , image (mathematics)
A neural language model trained on a text corpus can be used to induce distributed representations of words, such that similar words end up with similar representations. If the corpus is multilingual, the same model can be used to learn distributed representations of languages, such that similar languages end up with similar representations. We show that this holds even when the multilingual corpus has been translated into English, by picking up the faint signal left by the source languages. However, just like it is a thorny problem to separate semantic from syntactic similarity in word representations, it is not obvious what type of similarity is captured by language representations. We investigate correlations and causal relationships between language representations learned from translations on one hand, and genetic, geographical, and several levels of structural similarity between languages on the other. Of these, structural similarity is found to correlate most strongly with language representation s...
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom