Cross-lingual text classification with model translation and document translation
Author(s) -
Teng-Sheng Moh,
Zhang Zhang
Publication year - 2012
Publication title -
san josé state university scholarworks (san jose state university)
Language(s) - English
Resource type - Conference proceedings
DOI - 10.1145/2184512.2184530
Subject(s) - computer science , natural language processing , machine translation , artificial intelligence , classifier (uml) , translation (biology) , example based machine translation , categorization , computer assisted translation , language translation , text categorization , machine translation software usability , language model , information retrieval , biochemistry , chemistry , messenger rna , gene
Text classification assumes that the documents are in the same language, so when a classifier tries to categorize these documents in different languages, the trained model in mono-language will not work. The most direct solution is to translate all the documents in other languages into one language with the machine translator. Another approach is to translate the features extracted from one language into a second language and use them to classify the second language. In this paper, the authors propose a new method that adopts both the model translation and the document translation methods. This new method can take advantage of the best of the functionality between both the document translation and model translation methods.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom