z-logo
Premium
A Machine Learning Classifier for Assigning Individual Patients With Systemic Sclerosis to Intrinsic Molecular Subsets
Author(s) -
Franks Jennifer M.,
Martyanov Viktor,
Cai Guoshuai,
Wang Yue,
Li Zhenghui,
Wood Tammara A.,
Whitfield Michael L.
Publication year - 2019
Publication title -
arthritis and rheumatology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 4.106
H-Index - 314
eISSN - 2326-5205
pISSN - 2326-5191
DOI - 10.1002/art.40898
Subject(s) - classifier (uml) , artificial intelligence , discriminative model , pattern recognition (psychology) , gene expression profiling , computer science , machine learning , cluster analysis , support vector machine , gene expression , gene , biology , biochemistry
Objective High‐throughput gene expression profiling of tissue samples from patients with systemic sclerosis ( SS c) has identified 4 “intrinsic” gene expression subsets: inflammatory, fibroproliferative, normal‐like, and limited. Prior methods required agglomerative clustering of many samples. In order to classify individual patients in clinical trials or for diagnostic purposes, supervised methods that can assign single samples to molecular subsets are required. We undertook this study to introduce a novel machine learning classifier as a robust accurate intrinsic subset predictor. Methods Three independent gene expression cohorts were curated and merged to create a data set covering 297 skin biopsy samples from 102 unique patients and controls, which was used to train a machine learning algorithm. We performed external validation using 3 independent SS c cohorts, including a gene expression data set generated by an independent laboratory on a different microarray platform. In total, 413 skin biopsy samples from 213 individuals were analyzed in the training and testing cohorts. Results Repeated cross‐fold validation identified consistent and discriminative markers using multinomial elastic net, performing with an average classification accuracy of 87.1% with high sensitivity and specificity. In external validation, the classifier achieved an average accuracy of 85.4%. Reanalyzing data from a previous study, we identified subsets of patients that represent the canonical inflammatory, fibroproliferative, and normal‐like subsets. Conclusion We developed a highly accurate classifier for SS c molecular subsets for individual patient samples. The method can be used in SS c clinical trials to identify an intrinsic subset on individual samples. Our method provides a robust data‐driven approach to aid clinical decision‐making and interpretation of heterogeneous molecular information in SS c patients.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here