Alternative preprocessing of RNA-Sequencing data in The Cancer Genome Atlas leads to improved analysis results
Author(s) -
Mumtahena Rahman,
Laurie K. Jackson,
W. Evan Johnson,
Dean Y. Li,
Andrea H. Bild,
Stephen Piccolo
Publication year - 2015
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btv377
Subject(s) - computational biology , biology , kras , preprocessor , pipeline (software) , computer science , gene , data mining , genetics , artificial intelligence , mutation , programming language
The Cancer Genome Atlas (TCGA) RNA-Sequencing data are used widely for research. TCGA provides 'Level 3' data, which have been processed using a pipeline specific to that resource. However, we have found using experimentally derived data that this pipeline produces gene-expression values that vary considerably across biological replicates. In addition, some RNA-Sequencing analysis tools require integer-based read counts, which are not provided with the Level 3 data. As an alternative, we have reprocessed the data for 9264 tumor and 741 normal samples across 24 cancer types using the Rsubread package. We have also collated corresponding clinical data for these samples. We provide these data as a community resource.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom