z-logo
open-access-imgOpen Access
Monitoring of Technical Variation in Quantitative High-Throughput Datasets
Author(s) -
Martin Lauss,
Ilhami Visne,
Albert Kriegner,
Markus Ringnér,
Göran Jönsson,
Mattias Höglund
Publication year - 2013
Publication title -
cancer informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.606
H-Index - 31
ISSN - 1176-9351
DOI - 10.4137/cin.s12862
Subject(s) - computer science , data mining , throughput , software , variation (astronomy) , analytics , principal component analysis , artificial intelligence , operating system , physics , astrophysics , wireless
High-dimensional datasets can be confounded by variation from technical sources, such as batches. Undetected batch effects can have severe consequences for the validity of a study's conclusion(s). We evaluate high-throughput RNAseq and miRNAseq as well as DNA methylation and gene expression microarray datasets, mainly from the Cancer Genome Atlas (TCGA) project, in respect to technical and biological annotations. We observe technical bias in these datasets and discuss corrective interventions. We then suggest a general procedure to control study design, detect technical bias using linear regression of principal components, correct for batch effects, and re-evaluate principal components. This procedure is implemented in the R package swamp, and as graphical user interface software. In conclusion, high-throughput platforms that generate continuous measurements are sensitive to various forms of technical bias. For such data, monitoring of technical variation is an important analysis step.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom