Evaluation of residue-residue contact prediction methods: From retrospective to prospective | Zendy

Huiling Zhang | Zendy; Zhendong Bei | Zendy; Wenhui Xi | Zendy; Min Hao | Zendy; Zhen Ju | Zendy; Konda Mani Saravanan | Zendy; Haiping Zhang | Zendy; Ning Guo | Zendy; Yanjie Wei | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Evaluation of residue-residue contact prediction methods: From retrospective to prospective

Author(s) -

Huiling Zhang,

Zhendong Bei,

Wenhui Xi,

Min Hao,

Zhen Ju,

Konda Mani Saravanan,

Haiping Zhang,

Ning Guo,

Yanjie Wei

Publication year - 2021

Publication title -

plos computational biology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 2.628

H-Index - 182

eISSN - 1553-7358

pISSN - 1553-734X

DOI - 10.1371/journal.pcbi.1009027

Subject(s) - computer science , false positive paradox , protein structure prediction , benchmark (surveying) , set (abstract data type) , data mining , sequence (biology) , training set , artificial intelligence , pattern recognition (psychology) , machine learning , algorithm , protein structure , biology , geography , biochemistry , geodesy , programming language , genetics

Sequence-based residue contact prediction plays a crucial role in protein structure reconstruction. In recent years, the combination of evolutionary coupling analysis (ECA) and deep learning (DL) techniques has made tremendous progress for residue contact prediction, thus a comprehensive assessment of current methods based on a large-scale benchmark data set is very needed. In this study, we evaluate 18 contact predictors on 610 non-redundant proteins and 32 CASP13 targets according to a wide range of perspectives. The results show that different methods have different application scenarios: (1) DL methods based on multi-categories of inputs and large training sets are the best choices for low-contact-density proteins such as the intrinsically disordered ones and proteins with shallow multi-sequence alignments (MSAs). (2) With at least 5L (L is sequence length) effective sequences in the MSA, all the methods show the best performance, and methods that rely only on MSA as input can reach comparable achievements as methods that adopt multi-source inputs. (3) For top L/5 and L/2 predictions, DL methods can predict more hydrophobic interactions while ECA methods predict more salt bridges and disulfide bonds. (4) ECA methods can detect more secondary structure interactions, while DL methods can accurately excavate more contact patterns and prune isolated false positives. In general, multi-input DL methods with large training sets dominate current approaches with the best overall performance. Despite the great success of current DL methods must be stated the fact that there is still much room left for further improvement: (1) With shallow MSAs, the performance will be greatly affected. (2) Current methods show lower precisions for inter-domain compared with intra-domain contact predictions, as well as very high imbalances in precisions between intra-domains. (3) Strong prediction similarities between DL methods indicating more feature types and diversified models need to be developed. (4) The runtime of most methods can be further optimized.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research