Un método de análisis de lenguaje tipo SMS para el castellano
Author(s) -
José María Gómez Hidalgo,
Andrés Alfonso Caurcel Díaz,
Yovan Iñiguez del Rio
Publication year - 2013
Publication title -
linguamática
Language(s) - English
DOI - 10.21814/lm.5.1.156
The usage of specific language codes and chat and SMS-like messages is a major trend in electronic communications. This fact makes Natrual Language Processing quite hard, even at the simplest step fo text message tokenization, due to the widespread usage of non-alphanumeric symbols, frequent typos and non-standard word separators. In this work we present a new approach for text message tokenization, specific for the Spanish language as used in Social Networks and in electronic communications. Our system has been integrated in a more general application for age-detection in Social Networks developed in the research and development project WENDY, and it has been quantitatively evaluated both in a direct fashion, and indirectly by its impact on the genearl age-detection application, showing very promising results.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom