Premium
Regression based machine learning model for dementia diagnosis in a community setting
Author(s) -
Salem Fatima Abu,
Chaaya Monique,
Ghannam Hiyam,
Al Feel Roaa E,
El Asmar Khalil
Publication year - 2021
Publication title -
alzheimer's and dementia
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 6.713
H-Index - 118
eISSN - 1552-5279
pISSN - 1552-5260
DOI - 10.1002/alz.053839
Subject(s) - undersampling , dementia , machine learning , random forest , artificial intelligence , population , medicine , statistics , computer science , psychology , disease , mathematics , environmental health
Abstract Background As the world’s population ages, the prevalence of cognitive decline, dementia and Alzheimer’s disease is on the increasing trend. From a public health perspective and the absence of a cure, the early identification of individuals at risk of dementia becomes of paramount importance for proper early prevention. We developed a ML application for dementia diagnosis that is based on the 10/66 one stage dementia diagnostic algorithm. Method Training and testing data belonged to three community based surveys conducted in Lebanon as part of a larger dementia cohort study. Our training sample was 802, our testing sample 200, with the dataset showing imbalance (20% positive, 80% negative). In our supervised approach, we explore techniques for imbalanced learning beginning with artificially oversampling the minority class and undersampling the majority class, in order to decrease the bias of any model to be trained on the dataset. Additionally, we incorporate cost‐sensitive techniques to over‐penalise the models when misclassifying an instance in the minority class. We explore both models that produce a probability that a person has dementia, as well as models that produce a crisp class label (dementia or not). Result The best model hyperparameters including the cost to be incorporated and the percentage of oversampling/undersampling are tuned via three times repeated, 10‐fold stratified cross validation. The balanced random forest was considered as our most robust probabilistic model ‐ using only 20 features/variables‐ with a F2 score=0.82 and G‐Mean= 0.88 and ROC AUC=0.88. The Calibrated Weighted SVM for the same number of features was our best classification model with F2‐score=0.74 and a ROC AUC=0.80 Conclusion Compared to the regular 10/66 one stage dementia diagnostic algorithm, our proposed models show very promising results and high level of accuracy by using only a 20 variables predictive model. With additional data, such models can gain in accuracy, precision and efficiency. Machine learning based models would serve as a valuable community dementia screening and diagnostic tools.