The ability to classify patients based on gene-expression data varies by algorithm and performance metric | Zendy

Stephen Piccolo | Zendy; Avery Mecham | Zendy; Nathan P Golightly | Zendy; Jérémie L Johnson | Zendy; Dustin Miller | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

The ability to classify patients based on gene-expression data varies by algorithm and performance metric

Author(s) -

Stephen Piccolo,

Avery Mecham,

Nathan P Golightly,

Jérémie L Johnson,

Dustin Miller

Publication year - 2022

Publication title -

plos computational biology/plos computational biology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 2.628

H-Index - 182

eISSN - 1553-7358

pISSN - 1553-734X

DOI - 10.1371/journal.pcbi.1009926

Subject(s) - metric (unit) , gene expression , algorithm , computer science , gene expression profiling , expression (computer science) , computational biology , artificial intelligence , pattern recognition (psychology) , bioinformatics , gene , biology , genetics , programming language , operations management , economics

By classifying patients into subgroups, clinicians can provide more effective care than using a uniform approach for all patients. Such subgroups might include patients with a particular disease subtype, patients with a good (or poor) prognosis, or patients most (or least) likely to respond to a particular therapy. Transcriptomic measurements reflect the downstream effects of genomic and epigenomic variations. However, high-throughput technologies generate thousands of measurements per patient, and complex dependencies exist among genes, so it may be infeasible to classify patients using traditional statistical models. Machine-learning classification algorithms can help with this problem. However, hundreds of classification algorithms exist—and most support diverse hyperparameters—so it is difficult for researchers to know which are optimal for gene-expression biomarkers. We performed a benchmark comparison, applying 52 classification algorithms to 50 gene-expression datasets (143 class variables). We evaluated algorithms that represent diverse machine-learning methodologies and have been implemented in general-purpose, open-source, machine-learning libraries. When available, we combined clinical predictors with gene-expression data. Additionally, we evaluated the effects of performing hyperparameter optimization and feature selection using nested cross validation. Kernel- and ensemble-based algorithms consistently outperformed other types of classification algorithms; however, even the top-performing algorithms performed poorly in some cases. Hyperparameter optimization and feature selection typically improved predictive performance, and univariate feature-selection algorithms typically outperformed more sophisticated methods. Together, our findings illustrate that algorithm performance varies considerably when other factors are held constant and thus that algorithm selection is a critical step in biomarker studies.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore