Confirmation of data mining based predictions of protein function
Author(s) -
Ross D. King,
Paul H. Wise,
Amanda Clare
Publication year - 2004
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/bth047
Subject(s) - orfs , computer science , annotation , function (biology) , similarity (geometry) , data mining , genome , voting , open reading frame , artificial intelligence , computational biology , machine learning , biology , genetics , peptide sequence , gene , image (mathematics) , politics , political science , law
A central problem in bioinformatics is the assignment of function to sequenced open reading frames (ORFs). The most common approach is based on inferred homology using a statistically based sequence similarity (SIM) method, e.g. PSI-BLAST. Alternative non-SIM based bioinformatic methods are becoming popular. One such method is Data Mining Prediction (DMP). This is based on combining evidence from amino-acid attributes, predicted structure and phylogenic patterns; and uses a combination of Inductive Logic Programming data mining, and decision trees to produce prediction rules for functional class. DMP predictions are more general than is possible using homology. In 2000/1, DMP was used to make public predictions of the function of 1309 Escherichia coli ORFs. Since then biological knowledge has advanced allowing us to test our predictions.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom