Open Access
Decision tree‑based classifiers for lung cancer diagnosis and subtyping using TCGA miRNA expression data
Author(s) -
Masih Sherafatian,
Fateme Arjmand
Publication year - 2019
Publication title -
oncology letters
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.766
H-Index - 54
eISSN - 1792-1082
pISSN - 1792-1074
DOI - 10.3892/ol.2019.10462
Subject(s) - subtyping , lung cancer , adenocarcinoma , microrna , oncogene , oncology , biology , molecular medicine , cancer , decision tree , medicine , cell cycle , gene , machine learning , computer science , programming language , biochemistry
Lung cancer has the world's highest cancer- associated mortality rate, making biomarker discovery for this cancer a pressing issue. Machine learning approaches to identify molecular biomarkers are not as prevalent as screening of potential biomarkers by differential expression analysis. However, several differentially expressed miRNAs involved in cancer have been identified using this approach. The availability of The Cancer Genome Atlas (TCGA) allows the use of machine-learning methods for the molecular profiling of tumors. The present study employed empirical negative control microRNAs (miRs) in lung cancer to normalize lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) datasets from TCGA to model decision trees in order to classify lung cancer status and subtype. The two primary classification models consisted of four miRNAs for lung cancer diagnosis and subtyping. hsa-miR-183 and hsa-miR-135b were used to distinguish lung tumors from normal samples taken from tissues adjacent to the tumor site, and hsa-miR-944 and hsa-miR-205 to further classify the tumors into LUAD and LUSC major subtypes. Specific cancer status classification models were also presented for each subtype.