
Depression and Suicide Analysis Using Machine Learning and NLP
Author(s) -
Pratyaksh Jain,
Karthik Ram Srinivas,
Abhishek Vichare
Publication year - 2022
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/2161/1/012034
Subject(s) - suicidal ideation , machine learning , artificial intelligence , naive bayes classifier , depression (economics) , logistic regression , support vector machine , mental health , punctuality , suicide prevention , mental illness , psychology , computer science , poison control , psychiatry , medicine , medical emergency , mathematics , statistics , economics , macroeconomics
Depression is a common type of mental illness that can impair performance and lead to suicide ideation or attempts. Traditional techniques used by mental health experts can assist in determining an individual’s type of depression. Machine learning and NLP were used to understand how to predict posts that indicate depression in people and their accuracy. For this work, we have used a dataset from reddit. Reddit is an ideal destination to use as a supplement to the traditional public health system because of its punctuality in exchanging ideas, versatility in presenting emotions, as well as compatibility to use medical terms. We examined the comments and posts about suicidal ideation. We used NLP to gain a better understanding of interdisciplinary fields which are related to suicide. We discovered two help groups for depression and suicidal thoughts: r/depression and r/SuicideWatch. The famous “SuicideWatch” subreddit is commonly used by people who have thoughts of suicide and gives significant signals for suicidal behavior. A brief scan through the articles discloses that the subreddits are legitimate online spots to seek assistance and provide honest text data about people’s mental state. We have used multiple ML algorithms such as Naïve Bayes, SVM. To address the research problem, we have considered two subreddits that provided us with appropriate information to track people at risk. We achieved results of 77.29 % accuracy and 0.77 f1-score of Logistic Regression, 74.35 % accuracy and 0.74 f1-score of Naïve Bayes, 77.120% accuracy and 0.77 f1-score of Support Vector Machine, 77.298% accuracy, and 0.77 f1-score of Random Forest.