
On Text Preprocessing for Early Detection of Depression on Social Media
Author(s) -
José Manuel Cadenas,
Rodrigo Tripodi Calumby
Publication year - 2020
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5753/sbcas.2020.11504
Subject(s) - preprocessor , computer science , social media , data pre processing , field (mathematics) , artificial intelligence , raw data , depression (economics) , machine learning , balance (ability) , natural language processing , data science , psychology , world wide web , mathematics , neuroscience , pure mathematics , economics , macroeconomics , programming language
Depression is a serious challenge to public health. Many of those who suffer from this disease use social media for information or relief. The text data produced by these users can be used to support research in this field. However, this raw information is not always suitable for use directly in machine learning. Hence, a comparative analysis was performed between different preprocessing techniques to verify the impact on the effectiveness of early depression detection on social media. The results show that the preprocessing contributes to an increase in the prediction effectiveness. Moreover, the mapping of emoticons to real emotion words was decisive to improve not only model’s effectiveness, but also to keep the balance between different evaluation measures.