Text-based over-representation analysis of microarray gene lists with annotation bias
Author(s) -
Hui Sun Leong,
David Kipling
Publication year - 2009
Publication title -
nucleic acids research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 9.008
H-Index - 537
eISSN - 1362-4954
pISSN - 0305-1048
DOI - 10.1093/nar/gkp310
Subject(s) - annotation , biology , gene annotation , hypergeometric distribution , representation (politics) , information retrieval , computational biology , data mining , computer science , bioinformatics , gene , genetics , genome , statistics , mathematics , politics , political science , law
A major challenge in microarray data analysis is the functional interpretation of gene lists. A common approach to address this is over-representation analysis (ORA), which uses the hypergeometric test (or its variants) to evaluate whether a particular functionally defined group of genes is represented more than expected by chance within a gene list. Existing applications of ORA have been largely lim- ited to pre-defined terminologies such as GO and KEGG. We report our explorations of whether ORA can be applied to a wider mining of free-text. We found that a hitherto underappreciated feature of experimentally derived gene lists is that the consti- tuents have substantially more annotation asso- ciated with them, as they have been researched upon for a longer period of time. This bias, a result of patterns of research activity within the biomedical community, is a major problem for classical hyper- geometric test-based ORA approaches, which cannot account for such bias. We have therefore developed three approaches to overcome this bias, and demonstrate their usability in a wide range of published datasets covering different species. A comparison with existing tools that use GO terms suggests that mining PubMed abstracts can reveal additional biological insight that may not be possible by mining pre-defined ontologies alone.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom