Avoiding model selection bias in small-sample genomic datasets | Zendy

Daniel Berrar | Zendy; Ian Bradbury | Zendy; Werner Dubitzky | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Avoiding model selection bias in small-sample genomic datasets

Author(s) -

Daniel Berrar,

Ian Bradbury,

Werner Dubitzky

Publication year - 2006

Publication title -

bioinformatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 3.599

H-Index - 390

eISSN - 1367-4811

pISSN - 1367-4803

DOI - 10.1093/bioinformatics/btl066

Subject(s) - computer science , resampling , sample size determination , context (archaeology) , selection (genetic algorithm) , sample (material) , data mining , sampling (signal processing) , sampling bias , model selection , artificial intelligence , machine learning , selection bias , statistics , mathematics , biology , paleontology , chemistry , filter (signal processing) , chromatography , computer vision

Genomic datasets generated by high-throughput technologies are typically characterized by a moderate number of samples and a large number of measurements per sample. As a consequence, classification models are commonly compared based on resampling techniques. This investigation discusses the conceptual difficulties involved in comparative classification studies. Conclusions derived from such studies are often optimistically biased, because the apparent differences in performance are usually not controlled in a statistically stringent framework taking into account the adopted sampling strategy. We investigate this problem by means of a comparison of various classifiers in the context of multiclass microarray data.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research