Hybrid System for Information Extraction from Social Media Text: Drug Abuse Case Study
Author(s) -
Ferdaous Jenhani,
Mohamed Salah Gouider,
Lamjed Ben Saïd
Publication year - 2019
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2019.09.224
Subject(s) - computer science , social media , information extraction , strengths and weaknesses , salient , field (mathematics) , data science , domain (mathematical analysis) , health care , artificial intelligence , natural language processing , information retrieval , machine learning , world wide web , mathematical analysis , philosophy , mathematics , epistemology , pure mathematics , economics , economic growth
Social media are becoming widely used in the healthcare field as a patients-caregivers communication tool giving birth to new sources of information rich with the knowledge that may improve this field. Therefore, social media data analysis becomes a real business requirement for healthcare industrials and data scientists. However, regarding their complexity and unstructured character, existing natural language processing tools cannot succeed their exploitation. In the literature, a wide range of approaches appeared based on dictionaries, linguistic patterns and machine learning having their strengths and weaknesses. In this work, we propose a hybrid system combining the above approaches by taking the advantage of each of them to extract structured and salient drug abuse information from health-related tweets. We improve the system accuracy by real time update of the domain dictionary. We collected 1 tweets and we conducted different experiments showing the advantage of hybridization on efficient information extraction from social media data.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom