Inverted indexing for cross-lingual NLP
Author(s) -
Anders Søgaard,
Żeljko Agić,
Hėctor Martínez Alonso,
Barbara Plank,
Bernd Bohnet,
Anders Johannsen
Publication year - 2015
Language(s) - English
Resource type - Conference proceedings
DOI - 10.3115/v1/p15-1165
Subject(s) - search engine indexing , computer science , parsing , dependency grammar , artificial intelligence , natural language processing , word (group theory) , dependency (uml) , simple (philosophy) , information retrieval , mathematics , epistemology , geometry , philosophy
We present a novel, count-based approach to obtaining inter-lingual word representations based on inverted indexing of Wikipedia. We present experiments applying these representations to 17 datasets in document classification, POS tagging, dependency parsing, and word alignment. Our approach has the advantage that it is simple, computationally efficient and almost parameter-free, and, more importantly, it enables multi-source crosslingual learning. In 14/17 cases, we improve over using state-of-the-art bilingual embeddings.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom