z-logo
open-access-imgOpen Access
Typology for Linguistic Pattern in English-Hindi Journalistic Text Reuse
Author(s) -
Aarti Kumar,
Sujoy Das
Publication year - 2016
Publication title -
international journal of information technology and computer science
Language(s) - English
Resource type - Journals
eISSN - 2074-9015
pISSN - 2074-9007
DOI - 10.5815/ijitcs.2016.08.09
Subject(s) - computer science , paraphrase , natural language processing , hindi , artificial intelligence , typology , linguistics , linguistic typology , feature (linguistics) , machine translation , scripting language , natural language , history , philosophy , archaeology , operating system
Linking and tracking news stories covering the same events written in different languages is a challenging task. In natural languages same informat ion may be expressed in mult iple ways and newspapers try to exploit this feature for making the news stories more appealing. It has been observed that the same news story is presented in same as well as in different language in different ways but normally the g ist remains the same. Diversity of linguistic expressions presents a major challenge in identifying and tracking news stories covering the same events across languages , but doing so may provide rich and valuable resources as comparable and parallel corpora can be generated with this resource. In the case of Indian languages there exist limited language resources for Natural Language Processing and Information Retrieval tasks and identifying comparable and parallel documents would offer a potential source for deriving b ilingual d ictionaries and training statistical Machine Translation systems. Paraphrasing is the most common way of reproducing news stories and translated text is also a type of paraphrase. Prior to linking monolingual or b ilingual news stories, these paraphrase types need to identified and classified to help researchers to devise techniques to solve these challenging problems. English-Hindi language pair not only differs in their scripts but also in their g rammar and vocabulary. A number of paraphrase typologies have been built from the perspective of Natural Language Processing or for some or the other specific applicat ions but as per the knowledge of the authors, no typology have been reported for English-Hindi cross language text reuse. In this paper a typology is formulated for cross lingual journalistic text reuse in English-Hindi. Typology unravels level of difficult ies in English-Hindi mapping. It shall help in devising techniques for linking and tracking English-Hindi stories

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom