Automatic Detection and Language Identification of Multilingual Documents | Zendy

Marco Lui | Zendy; Jey Han Lau | Zendy; Timothy Baldwin | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Automatic Detection and Language Identification of Multilingual Documents

Author(s) -

Marco Lui,

Jey Han Lau,

Timothy Baldwin

Publication year - 2014

Publication title -

transactions of the association for computational linguistics

Language(s) - English

Resource type - Journals

ISSN - 2307-387X

DOI - 10.1162/tacl_a_00163

Subject(s) - computer science , natural language processing , identification (biology) , task (project management) , language identification , artificial intelligence , information retrieval , natural language , botany , management , economics , biology

Language identification is the task of automatically detecting the language(s) present in a document based on the content of the document. In this work, we address the problem of detecting documents that contain text from more than one language (multilingual documents). We introduce a method that is able to detect that a document is multilingual, identify the languages present, and estimate their relative proportions. We demonstrate the effectiveness of our method over synthetic data, as well as real-world multilingual documents collected from the web.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research