Don’t split your data
Author(s) -
Henrik Källberg,
Lars Alfredsson,
Maria Feychting,
Anders Ahlbom
Publication year - 2010
Publication title -
european journal of epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.825
H-Index - 111
eISSN - 1573-7284
pISSN - 0393-2990
DOI - 10.1007/s10654-010-9447-3
Subject(s) - medicine , data set , bayesian probability , perspective (graphical) , sample (material) , set (abstract data type) , statistics , data mining , artificial intelligence , mathematics , computer science , chemistry , chromatography , programming language
False positive findings are a common problem in whole genome association studies. In this commentary we show that nothing is gained by randomly splitting a data sample to two equal sized subsets, where the first data subset is used for explorative purposes and the other sub set is used to confirm the findings in the first subset. We compare the random splitting procedure to using the full data sample for analysis, by using a Bayesian perspective with consideration taken to prior probability of a false positive finding.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom