Premium
A network‐based variable selection approach for identification of modules and biomarker genes associated with end‐stage kidney disease
Author(s) -
Zeng Xiaoxi,
Li Chunyang,
Li Yi,
Yu Haopeng,
Fu Ping,
Hong Hyokyoung G.,
Zhang Wei
Publication year - 2020
Publication title -
nephrology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.752
H-Index - 61
eISSN - 1440-1797
pISSN - 1320-5358
DOI - 10.1111/nep.13655
Subject(s) - computational biology , biomarker , kegg , lasso (programming language) , gene , biology , bioinformatics , genetics , gene expression , gene ontology , computer science , world wide web
Aims Intervention for end‐stage kidney disease (ESKD), which is associated with adverse prognoses and major economic burdens, is challenging due to its complex pathogenesis. The study was performed to identify biomarker genes and molecular mechanisms for ESKD by bioinformatics approach. Methods Using the Gene Expression Omnibus dataset GSE37171, this study identified pathways and genomic biomarkers associated with ESKD via a multi‐stage knowledge discovery process, including identification of modules of genes by weighted gene co‐expression network analysis, discovery of important involved pathways by Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses, selection of differentially expressed genes by the empirical Bayes method, and screening biomarker genes by the least absolute shrinkage and selection operator (Lasso) logistic regression. The results were validated using GSE70528, an independent testing dataset. Results Three clinically important gene modules associated with ESKD, were identified by weighted gene co‐expression network analysis. Within these modules, Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses revealed important biological pathways involved in ESKD, including transforming growth factor‐β and Wnt signalling, RNA‐splicing, autophagy and chromatin and histone modification. Furthermore, Lasso logistic regression was conducted to identify five final genes, namely, CNOT8 , MST4 , PPP2CB , PCSK7 and RBBP4 that are differentially expressed and associated with ESKD. The accuracy of the final model in distinguishing the ESKD cases and controls was 96.8% and 91.7% in the training and validation datasets, respectively. Conclusion Network‐based variable selection approaches can identify biological pathways and biomarker genes associated with ESKD. The findings may inform more in‐depth follow‐up research and effective therapy.