z-logo
open-access-imgOpen Access
DNA Methylation Biomarkers-Based Human Age Prediction Using Machine Learning
Author(s) -
Atef Zaguia,
Deepak Pandey,
Sandeep Painuly,
Saurabh Kumar Pal,
Vivek Kumar Garg,
Neelam Goel
Publication year - 2022
Publication title -
computational intelligence and neuroscience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.605
H-Index - 52
eISSN - 1687-5273
pISSN - 1687-5265
DOI - 10.1155/2022/8393498
Subject(s) - random forest , dna methylation , gradient boosting , regression , artificial intelligence , support vector machine , correlation , linear regression , machine learning , pearson product moment correlation coefficient , regression analysis , boosting (machine learning) , methylation , correlation coefficient , computer science , predictive modelling , statistics , biology , mathematics , genetics , dna , gene , gene expression , geometry
Purpose. Age can be an important clue in uncovering the identity of persons that left biological evidence at crime scenes. With the availability of DNA methylation data, several age prediction models are developed by using statistical and machine learning methods. From epigenetic studies, it has been demonstrated that there is a close association between aging and DNA methylation. Most of the existing studies focused on healthy samples, whereas diseases may have a significant impact on human age. Therefore, in this article, an age prediction model is proposed using DNA methylation biomarkers for healthy and diseased samples. Methods. The dataset contains 454 healthy samples and 400 diseased samples from publicly available sources with age (1–89 years old). Six CpG sites are identified from this data having a high correlation with age using Pearson’s correlation coefficient. In this work, the age prediction model is developed using four different machine learning techniques, namely, Multiple Linear Regression, Support Vector Regression, Gradient Boosting Regression, and Random Forest Regression. Separate models are designed for healthy and diseased data. The data are split randomly into 80 : 20 ratios for training and testing, respectively. Results. Among all the techniques, the model designed using Random Forest Regression shows the best performance, and Gradient Boosting Regression is the second best model. In the case of healthy samples, the model achieved a MAD of 2.51 years for training data and 4.85 for testing data. Also, for diseased samples, a MAD of 3.83 years is obtained for training and 9.53 years for testing. Conclusion. These results showed that the proposed model can predict age for healthy and diseased samples.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom