Predictor correlation impacts machine learning algorithms: implications for genomic studies
Author(s) -
Kristin K. Nicodemus,
James D. Malley
Publication year - 2009
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btp331
Subject(s) - random forest , permutation (music) , correlation , multivariate statistics , machine learning , computer science , inference , tree (set theory) , algorithm , artificial intelligence , matthews correlation coefficient , regression , feature (linguistics) , statistics , mathematics , mathematical analysis , linguistics , physics , geometry , philosophy , acoustics , support vector machine
The advent of high-throughput genomics has produced studies with large numbers of predictors (e.g. genome-wide association, microarray studies). Machine learning algorithms (MLAs) are a computationally efficient way to identify phenotype-associated variables in high-dimensional data. There are important results from mathematical theory and numerous practical results documenting their value. One attractive feature of MLAs is that many operate in a fully multivariate environment, allowing for small-importance variables to be included when they act cooperatively. However, certain properties of MLAs under conditions common in genomic-related data have not been well-studied--in particular, correlations among predictors pose a problem.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom