Improved prediction of smoking status via isoform-aware RNA-seq deep learning models | Zendy

Zifeng Wang | Zendy; Aria Masoomi | Zendy; Zhonghui Xu | Zendy; Adel Boueiz | Zendy; Sool Lee | Zendy; Tingting Zhao | Zendy; Russell P. Bowler | Zendy; Michael H. Cho | Zendy; Edwin K. Silverman | Zendy; Craig P. Hersh | Zendy; Jennifer Dy | Zendy; Peter J. Castaldi | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Improved prediction of smoking status via isoform-aware RNA-seq deep learning models

Author(s) -

Zifeng Wang,

Aria Masoomi,

Zhonghui Xu,

Adel Boueiz,

Sool Lee,

Tingting Zhao,

Russell P. Bowler,

Michael H. Cho,

Edwin K. Silverman,

Craig P. Hersh,

Jennifer Dy,

Peter J. Castaldi

Publication year - 2021

Publication title -

plos computational biology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 2.628

H-Index - 182

eISSN - 1553-7358

pISSN - 1553-734X

DOI - 10.1371/journal.pcbi.1009433

Subject(s) - exon , alternative splicing , gene isoform , rna seq , computational biology , gene , rna splicing , gene prediction , biology , rna , gene expression , genetics , bioinformatics , artificial intelligence , computer science , transcriptome , genome

Most predictive models based on gene expression data do not leverage information related to gene splicing, despite the fact that splicing is a fundamental feature of eukaryotic gene expression. Cigarette smoking is an important environmental risk factor for many diseases, and it has profound effects on gene expression. Using smoking status as a prediction target, we developed deep neural network predictive models using gene, exon, and isoform level quantifications from RNA sequencing data in 2,557 subjects in the COPDGene Study. We observed that models using exon and isoform quantifications clearly outperformed gene-level models when using data from 5 genes from a previously published prediction model. Whereas the test set performance of the previously published model was 0.82 in the original publication, our exon-based models including an exon-to-isoform mapping layer achieved a test set AUC (area under the receiver operating characteristic) of 0.88, which improved to an AUC of 0.94 using exon quantifications from a larger set of genes. Isoform variability is an important source of latent information in RNA-seq data that can be used to improve clinical prediction models.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research