z-logo
open-access-imgOpen Access
The-more-the-better and the-less-the-better
Author(s) -
Wentian Li
Publication year - 2006
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btl189
Subject(s) - computer science
BIOINFORMATICS EDITORIAL The-more-the-better and the-less-the-better To many biologists, geneticists and bioinformaticians, the excitement of genomics comes from systematic analyses of large amounts of information, such as complete end-to-end DNA sequences, densely packed genetic markers on chromosomes, and sometimes, comprehensive population genetics history in places like Iceland and Finland where extensive genealogical data are available. Also, realizing the importance of sample sizes in mapping susceptibility genes in complex and common diseases, various national and international consortiums were established, and meta analyses were frequently in use on pooled data. Almost everybody agrees that 'the more (information), the better'. There are two senses in the word 'more' used here. One concerns the search space, and another concerns the sample size. It is easy to understand why one would like to see both or either one of them to be large. The reason for demanding a complete search space is that we do not want to miss anything. If we fail to detect a genetic linkage or association signal for a human disease, could it be that we have not covered all genomic regions with enough markers, or is it because we have not compiled a complete list of all coding genes? Having a complete search space will remove these doubts. The reason for larger sample size is well-known in statistics: if the statistical signal is weak, only a larger dataset has a chance to uncover it with confidence. Also, if the sample size is much smaller than the number of variables, such as the 'large p (number of genes), small n (sample size)' situation in microarray data, the variable space is not adequately explored, and degenerate fitting models are possible. The 'large p, even larger n' situation is preferable. 'The more, the better' trend is a natural byproduct of the genomic era, and will undoubtedly continue as ever more advanced biotechnology produces bioinformation faster and cheaper. However, for a specific biology project or a particular human disease study, not all genes are involved and not all chromosomal regions are relevant. An equally important process of removing the irrelevant information allows us to focus on the key areas. A cartoon in Weiss and Terwilliger (2000) compared the search of human disease genes with finding a needle in a haystack. In this example, reducing the haystack size instead of increasing it helps the chance of finding the needle. This principle might be called 'the less, …

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom