z-logo
Premium
Bioinformatics Methods to Select Prognostic Biomarker Genes from Large Scale Datasets: A Review
Author(s) -
Jardillier Rémy,
Chatelain Florent,
Guyon Laurent
Publication year - 2018
Publication title -
biotechnology journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.144
H-Index - 84
eISSN - 1860-7314
pISSN - 1860-6768
DOI - 10.1002/biot.201800103
Subject(s) - benchmarking , computer science , biomarker discovery , feature selection , lasso (programming language) , computational biology , curse of dimensionality , scale (ratio) , data science , biomarker , data mining , machine learning , bioinformatics , gene , biology , proteomics , geography , marketing , world wide web , business , biochemistry , cartography
With the increased availability of survival datasets, that comprise both molecular information (e.g., gene expression), and clinical information (e.g., patient survival), numerous genes are proposed as prognostic biomarkers. Despite efforts and money invested, very few of these biomarkers have been clinically validated and are used routinely. A high false discovery rate is assumed to be largely responsible for this, in particular as the number of tested genes is extremely high relative to the number of patients followed. Here, after describing the historical methodologies on which recent developments have often been based, this review describes studies that have been performed in the last few years. The concepts will be illustrated for a renal cancer dataset, and the corresponding scripts are provided (Supporting Information). These new developments belong to three main fields of applications. First, variable selection concerns various improvements to lasso penalization. Second, accurate definition of p ‐values and control of the false discovery rate have also been the subject of many studies. Third, the incorporation of biological knowledge, often through the form of networks or pathways, can be used as an a priori and/or to reduce dimensionality. These new and promising developments deserve benchmarking by independent groups not involved in their development, with various independent datasets. Further work on the methodologies is also still required.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here