z-logo
open-access-imgOpen Access
Risk prediction of diabetes and pre-diabetes based on physical examination data
Author(s) -
Ying Han,
AUTHOR_ID,
Hui Yang,
Qin-Lai Huang,
ZiJie Sun,
Mingliang Li,
Jingbo Zhang,
Kejun Deng,
Shuo Chen,
Hao Lin,
AUTHOR_ID
Publication year - 2022
Publication title -
mathematical biosciences and engineering
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.451
H-Index - 45
eISSN - 1551-0018
pISSN - 1547-1063
DOI - 10.3934/mbe.2022166
Subject(s) - impaired fasting glucose , diabetes mellitus , type 2 diabetes , medicine , logistic regression , population , endocrinology , impaired glucose tolerance , environmental health
Diabetes is a metabolic disorder caused by insufficient insulin secretion and insulin secretion disorders. From health to diabetes, there are generally three stages: health, pre-diabetes and type 2 diabetes. Early diagnosis of diabetes is the most effective way to prevent and control diabetes and its complications. In this work, we collected the physical examination data from Beijing Physical Examination Center from January 2006 to December 2017, and divided the population into three groups according to the WHO (1999) Diabetes Diagnostic Standards: normal fasting plasma glucose (NFG) (FPG < 6.1 mmol/L), mildly impaired fasting plasma glucose (IFG) (6.1 mmol/L ≤ FPG < 7.0 mmol/L) and type 2 diabetes (T2DM) (FPG > 7.0 mmol/L). Finally, we obtained1,221,598 NFG samples, 285,965 IFG samples and 387,076 T2DM samples, with a total of 15 physical examination indexes. Furthermore, taking eXtreme Gradient Boosting (XGBoost), random forest (RF), Logistic Regression (LR), and Fully connected neural network (FCN) as classifiers, four models were constructed to distinguish NFG, IFG and T2DM. The comparison results show that XGBoost has the best performance, with AUC (macro) of 0.7874 and AUC (micro) of 0.8633. In addition, based on the XGBoost classifier, three binary classification models were also established to discriminate NFG from IFG, NFG from T2DM, IFG from T2DM. On the independent dataset, the AUCs were 0.7808, 0.8687, 0.7067, respectively. Finally, we analyzed the importance of the features and identified the risk factors associated with diabetes.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here