Open Access
Fishing with (Proto)Net—a principled approach to protein target selection
Author(s) -
Linial Michal
Publication year - 2003
Publication title -
comparative and functional genomics
Language(s) - English
Resource type - Journals
eISSN - 1532-6268
pISSN - 1531-6912
DOI - 10.1002/cfg.328
Subject(s) - structural genomics , computer science , computational biology , structural classification of proteins database , structural bioinformatics , protein data bank (rcsb pdb) , cluster analysis , protein superfamily , protein structure prediction , sequence (biology) , protein structure , threading (protein sequence) , superfamily , data mining , genomics , comparative genomics , biology , genome , machine learning , gene , genetics , biochemistry
Abstract Structural genomics strives to represent the entire protein space. The first step towards achieving this goal is by rationally selecting proteins whose structures have not been determined, but that represent an as yet unknown structural superfamily or fold. Once such a structure is solved, it can be used as a template for modelling homologous proteins. This will aid in unveiling the structural diversity of the protein space. Currently, no reliable method for accurate 3D structural prediction is available when a sequence or a structure homologue is not available. Here we present a systematic methodology for selecting target proteins whose structure is likely to adopt a new, as yet unknown superfamily or fold. Our method takes advantage of a global classification of the sequence space as presented by ProtoNet‐3D, which is a hierarchical agglomerative clustering of the proteins of interest (the proteins in Swiss‐Prot) along with all solved structures (taken from the PDB). By navigating in the scaffold of ProtoNet‐3D, we yield a prioritized list of proteins that are not yet structurally solved, along with the probability of each of the proteins belonging to a new superfamily or fold. The sorted list has been self‐validated against real structural data that was not available when the predictions were made. The practical application of using our computational–statistical method to determine novel superfamilies for structural genomics projects is also discussed. Copyright © 2003 John Wiley & Sons, Ltd.