z-logo
open-access-imgOpen Access
POS Tagging of English-Hindi Code-Mixed Social Media Content
Author(s) -
Yogarshi Vyas,
Spandana Gella,
Jatin Sharma,
Kalika Bali,
Monojit Choudhury
Publication year - 2014
Language(s) - English
Resource type - Conference proceedings
DOI - 10.3115/v1/d14-1105
Subject(s) - transliteration , hindi , computer science , spelling , natural language processing , artificial intelligence , social media , language identification , grammar , normalization (sociology) , named entity , world wide web , linguistics , natural language , philosophy , sociology , anthropology
Code-mixing is frequently observed in user generated content on social media, especially from multilingual users. The linguistic complexity of such content is compounded by presence of spelling variations, transliteration and non-adherance to formal grammar. We describe our initial efforts to create a multi-level annotated corpus of Hindi-English codemixed text collated from Facebook forums, and explore language identification, back-transliteration, normalization and POS tagging of this data. Our results show that language identification and transliteration for Hindi are two major challenges that impact POS tagging accuracy.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom