An efficient algorithm for large-scale detection of protein families | Zendy

Anton J. Enright | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

An efficient algorithm for large-scale detection of protein families

Author(s) -

Anton J. Enright

Publication year - 2002

Publication title -

nucleic acids research

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 9.008

H-Index - 537

eISSN - 1362-4954

pISSN - 0305-1048

DOI - 10.1093/nar/30.7.1575

Subject(s) - biology , cluster analysis , protein family , protein domain , computational biology , genomics , structural classification of proteins database , structural genomics , genome , protein function prediction , protein sequencing , human genome , hidden markov model , sequence alignment , comparative genomics , sequence (biology) , protein superfamily , functional genomics , genetics , gene , peptide sequence , protein structure , computer science , machine learning , artificial intelligence , protein function , biochemistry

Detection of protein families in large databases is one of the principal research objectives in structural and functional genomics. Protein family classification can significantly contribute to the delineation of functional diversity of homologous proteins, the prediction of function based on domain architecture or the presence of sequence motifs as well as comparative genomics, providing valuable evolutionary insights. We present a novel approach called TRIBE-MCL for rapid and accurate clustering of protein sequences into families. The method relies on the Markov cluster (MCL) algorithm for the assignment of proteins into families based on precomputed sequence similarity information. This novel approach does not suffer from the problems that normally hinder other protein sequence clustering algorithms, such as the presence of multi-domain proteins, promiscuous domains and fragmented proteins. The method has been rigorously tested and validated on a number of very large databases, including SwissProt, InterPro, SCOP and the draft human genome. Our results indicate that the method is ideally suited to the rapid and accurate detection of protein families on a large scale. The method has been used to detect and categorise protein families within the draft human genome and the resulting families have been used to annotate a large proportion of human proteins.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research