Scalable remote homology detection and fold recognition in massive protein networks | Zendy

Petegrosso Raphael | Zendy; Li Zhuliu | Zendy; Srour Molly A. | Zendy; Saad Yousef | Zendy; Zhang Wei | Zendy; Kuang Rui | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Scalable remote homology detection and fold recognition in massive protein networks

Author(s) -

Petegrosso Raphael,

Li Zhuliu,

Srour Molly A.,

Saad Yousef,

Zhang Wei,

Kuang Rui

Publication year - 2019

Publication title -

proteins: structure, function, and bioinformatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.699

H-Index - 191

eISSN - 1097-0134

pISSN - 0887-3585

DOI - 10.1002/prot.25669

Subject(s) - casp , computer science , scalability , massively parallel , protein sequencing , threading (protein sequence) , smith–waterman algorithm , pairwise comparison , theoretical computer science , computational biology , artificial intelligence , protein structure , protein structure prediction , sequence alignment , parallel computing , biology , gene , peptide sequence , genetics , database , biochemistry

The global connectivities in very large protein similarity networks contain traces of evolution among the proteins for detecting protein remote evolutionary relations or structural similarities. To investigate how well a protein network captures the evolutionary information, a key limitation is the intensive computation of pairwise sequence similarities needed to construct very large protein networks. In this article, we introduce label propagation on low‐rank kernel approximation (LP‐LOKA) for searching massively large protein networks. LP‐LOKA propagates initial protein similarities in a low‐rank graph by Nyström approximation without computing all pairwise similarities. With scalable parallel implementations based on distributed‐memory using message‐passing interface and Apache‐Hadoop/Spark on cloud, LP‐LOKA can search protein networks with one million proteins or more. In the experiments on Swiss‐Prot/ADDA/CASP data, LP‐LOKA significantly improved protein ranking over the widely used HMM‐HMM or profile‐sequence alignment methods utilizing large protein networks. It was observed that the larger the protein similarity network, the better the performance, especially on relatively small protein superfamilies and folds. The results suggest that computing massively large protein network is necessary to meet the growing need of annotating proteins from newly sequenced species and LP‐LOKA is both scalable and accurate for searching massively large protein networks.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research