Test set bias affects reproducibility of gene signatures
Author(s) -
Prasad Patil,
Pierre-Olivier Bachant-Winner,
Benjamin HaibeKains,
Jeffrey T. Leek
Publication year - 2015
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btv157
Subject(s) - normalization (sociology) , computer science , data set , data mining , set (abstract data type) , test set , reproducibility , sample size determination , population , test data , statistics , pattern recognition (psychology) , artificial intelligence , mathematics , medicine , environmental health , sociology , anthropology , programming language
Prior to applying genomic predictors to clinical samples, the genomic data must be properly normalized to ensure that the test set data are comparable to the data upon which the predictor was trained. The most effective normalization methods depend on data from multiple patients. From a biomedical perspective, this implies that predictions for a single patient may change depending on which other patient samples they are normalized with. This test set bias will occur when any cross-sample normalization is used before clinical prediction.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom