
A transcriptome-based classifier to determine molecular subtypes in medulloblastoma
Author(s) -
Komal S. Rathi,
Sherjeel Arif,
Mateusz Koptyra,
Ammar S. Naqvi,
Deanne Taylor,
Phillip B. Storm,
Adam Resnick,
Jo Lynne Rokita,
Pichai Raman
Publication year - 2020
Publication title -
plos computational biology/plos computational biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.628
H-Index - 182
eISSN - 1553-7358
pISSN - 1553-734X
DOI - 10.1371/journal.pcbi.1008263
Subject(s) - medulloblastoma , transcriptome , classifier (uml) , ptch1 , biology , computational biology , sonic hedgehog , gene expression profiling , microarray analysis techniques , microarray , bioinformatics , artificial intelligence , computer science , genetics , gene , gene expression
Medulloblastoma is a highly heterogeneous pediatric brain tumor with five molecular subtypes, Sonic Hedgehog TP53 -mutant, Sonic Hedgehog TP53 -wildtype, WNT, Group 3, and Group 4, defined by the World Health Organization. The current mechanism for classification into these molecular subtypes is through the use of immunostaining, methylation, and/or genetics. We surveyed the literature and identified a number of RNA-Seq and microarray datasets in order to develop, train, test, and validate a robust classifier to identify medulloblastoma molecular subtypes through the use of transcriptomic profiling data. We have developed a GPL-3 licensed R package and a Shiny Application to enable users to quickly and robustly classify medulloblastoma samples using transcriptomic data. The classifier utilizes a large composite microarray dataset (15 individual datasets), an individual microarray study, and an RNA-Seq dataset, using gene ratios instead of gene expression measures as features for the model. Discriminating features were identified using the limma R package and samples were classified using an unweighted mean of normalized scores. We utilized two training datasets and applied the classifier in 15 separate datasets. We observed a minimum accuracy of 85.71% in the smallest dataset and a maximum of 100% accuracy in four datasets with an overall median accuracy of 97.8% across the 15 datasets, with the majority of misclassification occurring between the heterogeneous Group 3 and Group 4 subtypes. We anticipate this medulloblastoma transcriptomic subtype classifier will be broadly applicable to the cancer research and clinical communities.