z-logo
open-access-imgOpen Access
Un método de análisis de lenguaje tipo SMS para el castellano
Author(s) -
José María Gómez Hidalgo,
Andrés Alfonso Caurcel Díaz,
Yovan Iñiguez del Rio
Publication year - 2013
Publication title -
linguamática
Language(s) - English
DOI - 10.21814/lm.5.1.156
The usage of specific language codes and chat and SMS-like messages is a major trend in electronic communications. This fact makes Natrual Language Processing quite hard, even at the simplest step fo text message tokenization, due to the widespread usage of non-alphanumeric symbols, frequent typos and non-standard word separators. In this work we present a new approach for text message tokenization, specific for the Spanish language as used in Social Networks and in electronic communications. Our system has been integrated in a more general application for age-detection in Social Networks developed in the research and development project WENDY, and it has been quantitatively evaluated both in a direct fashion, and indirectly by its impact on the genearl age-detection application, showing very promising results.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom