Premium
Characterising communities impacted by the 2015 Indiana HIV outbreak: A big data analysis of social media messages associated with HIV and substance abuse
Author(s) -
Cuomo Raphael E.,
Cai Mingxiang,
Shah Neal,
Li Jiawei,
Chen WenHao,
Obradovich Nick,
Mackey Tim K.
Publication year - 2020
Publication title -
drug and alcohol review
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.018
H-Index - 74
eISSN - 1465-3362
pISSN - 0959-5236
DOI - 10.1111/dar.13091
Subject(s) - substance abuse , social media , census , public health , outbreak , geospatial analysis , environmental health , medicine , geography , demography , psychiatry , computer science , sociology , population , cartography , virology , world wide web , nursing
and Aims Infoveillance approaches (i.e. surveillance methods using online content) that leverage big data can provide new insights about infectious disease outbreaks and substance use disorder topics. We assessed social media messages about HIV, opioid use and injection drug use in order to understand how unstructured data can prepare public health practitioners for response to future outbreaks. Design and Methods We conducted an retrospective analysis of Twitter messages during the 2015 HIV Indiana outbreak using machine learning, statistical and geospatial analysis to examine the transition between opioid prescription drug abuse to heroin injection use and finally HIV transmission risk, and to test possible associations with disease burden and demographic variables in Indiana and Marion County. Tweets from October 2014 to June 2015 were compared to disease burden at the county level for Indiana, and classification of census blocks by presence of relevant messages was done at the census block level for Marion County. Marion County was used as it exhibited the highest total count of Tweets. Results 257 messages about substance abuse and HIV were significantly related to HIV rates ( P < 0.001) and opioid‐related hospitalisations ( P = 0.037). Using 157 characteristics from the American Community Survey, a linear classifier was computed with an appreciable correlation ( r = 0.49) to risk‐related social media messages from Marion County. Discussion and Conclusions Communities appear to communicate online in response to disease burden. Classification produced an accurate equation to model census block risk based on census data, allowing for high‐dimensional estimation of risk for blocks with sparse populations.