Premium
LANGaware: Leveraging machine learning on natural language for the early detection of neurodegenerative and psychiatric diseases
Author(s) -
Rentoumi Vassiliki,
Vassiliou Evangelos,
Demiraj Admir,
Pittaras Nikiforos,
Mandalis Petros,
Alexandridou Martha,
Kemp Hollie,
Eleftheriou Ioanna,
Danezi Maria,
Hatzopoulou Maria,
Kamtsadeli Vasiliki,
Paliouras George,
Papatriantafyllou John D
Publication year - 2021
Publication title -
alzheimer's and dementia
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 6.713
H-Index - 118
eISSN - 1552-5279
pISSN - 1552-5260
DOI - 10.1002/alz.052520
Subject(s) - computer science , random forest , classifier (uml) , artificial intelligence , cognition , feature selection , decision tree , natural language processing , spoken language , retraining , machine learning , psychology , neuroscience , international trade , business
Background Various neurodegenerative and psychiatric diseases are related with changes in spoken language [1], although they have seldom been investigated. We evaluate the effectiveness of our language‐agnostic Machine Learning (ML) system to detect subtle changes in spoken language that manifest early signs of cognitive decline, thus assisting with its diagnosis. We evaluate our methodology using recordings of speech samples from multiple languages obtained from patient cohorts in early stages of cognitive decline and matched healthy controls. Method Our methodology involves capturing patient audio recordings while they are performing one or more predefined cognitive assessment tasks, involving, e.g., the description of a picture or recounting of an everyday activity. Afterwards, multi‐language audio recordings and generated transcripts are analyzed with audio and NLP feature extraction methods [3], ranging from semantic, morpho‐syntactic, phonological representations of the input, as well as, more sophisticated linguistic measures. The feature pool is filtered by a Pearson's rho threshold of 0.85. We build a Random Forest classifier out of 100 Decision trees, using the Gini impurity criterion, 5‐fold cross‐validation for training, elimination and composition‐based feature selection, as well as post‐selection retraining / fine‐tuning. The model's diagnostic performance is evaluated on a test set unseen during training. Result Our results are validated against the diagnosis that is provided by medical experts. Our performance in terms of accuracy (∼82%), f1 (84%) and ROC‐AUC score (∼82%) are clear indicators of the effectiveness of speech analysis towards detecting cognitive decline. Moreover, our tree‐based classifier produces probability scores that closely follow the proportion of pathological cases in the input data, with a correlation of 94%. Conclusion In the current evaluation we verified our conjectures regarding the strong capacity of speech to predict cognitive decline. Audio analysis and machine learning are proven to be invaluable tools in the prediction of early signs of cognitive decline, which are coupled with a wide spectrum of neurodegenerative and psychiatric diseases. [1] Boschi, Veronica, et al., Frontiers in psychology 8 (2017): 269. [2] Vassiliki Rentoumi et al., Alzheimer's & Dementia, Wiley, volume 16, 2020. [3] Alberdi, Ane et al., Artificial intelligence in medicine 71 (2016): 1‐29.