Effect of Size and Heterogeneity of Samples on Biomarker Discovery: Synthetic and Real Data Assessment | Zendy

Barbara Di Camillo | Zendy; Tiziana Sanavia | Zendy; Matteo Martini | Zendy; Giuseppe Jurman | Zendy; Francesco Sambo | Zendy; Annalisa Barla | Zendy; Margherita Squillario | Zendy; Cesare Furlanello | Zendy; Gianna Toffolo | Zendy; Claudio Cobelli | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Effect of Size and Heterogeneity of Samples on Biomarker Discovery: Synthetic and Real Data Assessment

Author(s) -

Barbara Di Camillo,

Tiziana Sanavia,

Matteo Martini,

Giuseppe Jurman,

Francesco Sambo,

Annalisa Barla,

Margherita Squillario,

Cesare Furlanello,

Gianna Toffolo,

Claudio Cobelli

Publication year - 2012

Publication title -

plos one

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.99

H-Index - 332

ISSN - 1932-6203

DOI - 10.1371/journal.pone.0032200

Subject(s) - biomarker discovery , benchmark (surveying) , computer science , population , consistency (knowledge bases) , data mining , identification (biology) , computational biology , in silico , feature selection , false discovery rate , bioinformatics , biology , machine learning , artificial intelligence , proteomics , medicine , genetics , gene , botany , environmental health , geodesy , geography

Motivation The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for the discovery of biomarkers using microarray data often provide results with limited overlap. These differences are imputable to 1) dataset size (few subjects with respect to the number of features); 2) heterogeneity of the disease; 3) heterogeneity of experimental protocols and computational pipelines employed in the analysis. In this paper, we focus on the first two issues and assess, both on simulated (through an in silico regulation network model) and real clinical datasets, the consistency of candidate biomarkers provided by a number of different methods. Methods We extensively simulated the effect of heterogeneity characteristic of complex diseases on different sets of microarray data. Heterogeneity was reproduced by simulating both intrinsic variability of the population and the alteration of regulatory mechanisms. Population variability was simulated by modeling evolution of a pool of subjects; then, a subset of them underwent alterations in regulatory mechanisms so as to mimic the disease state. Results The simulated data allowed us to outline advantages and drawbacks of different methods across multiple studies and varying number of samples and to evaluate precision of feature selection on a benchmark with known biomarkers. Although comparable classification accuracy was reached by different methods, the use of external cross-validation loops is helpful in finding features with a higher degree of precision and stability. Application to real data confirmed these results.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research