ResToRinG CaPitaLiZaTion in #TweeTs
Author(s) -
Kamel Nebhi,
Kalina Bontcheva,
Genevieve Gorrell
Publication year - 2015
Publication title -
proceedings of the 24th international conference on world wide web
Language(s) - English
Resource type - Conference proceedings
DOI - 10.1145/2740908.2743039
Subject(s) - microblogging , computer science , social media , natural language processing , artificial intelligence , named entity recognition , capitalization , language model , information retrieval , world wide web , linguistics , task (project management) , engineering , philosophy , systems engineering
The rapid proliferation of microblogs such as Twitter has resulted in a vast quantity of written text becoming available that contains interesting information for NLP tasks. However, the noise level in tweets is so high that standard NLP tools perform poorly. In this pa- per, we present a statistical truecaser for tweets using a 3-gram language model built with truecased newswire texts and tweets. Our truecasing method shows an improvement in named entity recognition and part-of-speech tagging tasks.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom