Premium
Structural similarity to link sequence space: New potential superfamilies and implications for structural genomics
Author(s) -
Aloy Patrick,
Oliva Baldomero,
Querol Enrique,
Aviles Francesc X.,
Russell Robert B.
Publication year - 2002
Publication title -
protein science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.353
H-Index - 175
eISSN - 1469-896X
pISSN - 0961-8368
DOI - 10.1110/ps.3950102
Subject(s) - structural genomics , structural alignment , sequence alignment , structural classification of proteins database , alignment free sequence analysis , computational biology , sequence (biology) , protein structure database , homology (biology) , biology , structural similarity , protein superfamily , protein family , protein structure , similarity (geometry) , structural motif , genomics , conserved sequence , genetics , peptide sequence , sequence database , genome , computer science , artificial intelligence , gene , biochemistry , image (mathematics)
The current pace of structural biology now means that protein three‐dimensional structure can be known before protein function, making methods for assigning homology via structure comparison of growing importance. Previous research has suggested that sequence similarity after structure‐based alignment is one of the best discriminators of homology and often functional similarity. Here, we exploit this observation, together with a merger of protein structure and sequence databases, to predict distant homologous relationships. We use the Structural Classification of Proteins (SCOP) database to link sequence alignments from the SMART and Pfam databases. We thus provide new alignments that could not be constructed easily in the absence of known three‐dimensional structures. We then extend the method of Murzin (1993b) to assign statistical significance to sequence identities found after structural alignment and thus suggest the best link between diverse sequence families. We find that several distantly related protein sequence families can be linked with confidence, showing the approach to be a means for inferring homologous relationships and thus possible functions when proteins are of known structure but of unknown function. The analysis also finds several new potential superfamilies, where inspection of the associated alignments and superimpositions reveals conservation of unusual structural features or co‐location of conserved amino acids and bound substrates. We discuss implications for Structural Genomics initiatives and for improvements to sequence comparison methods.