z-logo
Premium
Sequence Similarity Networks for the Protein Universe
Author(s) -
Whalen Katie,
Sadkhin Boris,
Davidson Daniel,
Gerlt John
Publication year - 2015
Publication title -
the faseb journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.709
H-Index - 277
eISSN - 1530-6860
pISSN - 0892-6638
DOI - 10.1096/fasebj.29.1_supplement.573.17
Subject(s) - uniprot , computer science , workflow , sequence database , protein sequencing , protein function prediction , similarity (geometry) , function (biology) , computational biology , sequence homology , protein function , data mining , bioinformatics , database , artificial intelligence , biology , peptide sequence , genetics , gene , image (mathematics)
As of November 2014, over 86 million protein sequences had been deposited in the TrEMBL database, of which only 0.5 million had experimental support for an enzymatic function. Currently, protein databases depend heavily on homology‐based predictions of enzyme function, yet it is estimated that only 50% of current predicted functions in UniprotKB are correct. The process would benefit greatly from added expertise, while still maintaining a balance of careful curation and throughput. For sequence comparison and visualization, the sequence similarity network (SSN) is a computationally efficient alternative to the standard dendrogram. Making SSNs easily accessible to the non‐bioinformatician allows enzymologists, microbiologists, and chemists to observe the sequence identity landscape for a protein family of interest and select more informed identity boundaries for appropriate transfer of function via homology. This talk describes the efforts of the Enzyme Function Initiative to provide precomputed SSNs for each unique protein family described in the Pfam database, covering ~80% of the protein universe. The project combines state‐of‐the‐art computational resources with a rigorous workflow that accommodates updates three times per year. Networks generated at the minimum E‐value threshold (E‐5) are available instantaneously (compared to 5.5 hr average for de novo generation), and generation times for networks with a user‐defined E‐value threshold are reduced 10‐fold. This talk also addresses how universal precomputation of Pfam SSNs facilitates integration of sequence similarity information with additional EFI tools for expedited enzyme function discovery.Supported by NIH grant U54 GM093342.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here