z-logo
open-access-imgOpen Access
Fast protein structure comparison through effective representation learning with contrastive graph neural networks
Author(s) -
Chun-Qiu Xia,
Shihao Feng,
Ying Xia,
Xiaoyong Pan,
HongBin Shen
Publication year - 2022
Publication title -
plos computational biology/plos computational biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.628
H-Index - 182
eISSN - 1553-7358
pISSN - 1553-734X
DOI - 10.1371/journal.pcbi.1009986
Subject(s) - computer science , graph , artificial intelligence , protein tertiary structure , cosine similarity , theoretical computer science , source code , discriminative model , protein structure prediction , artificial neural network , pattern recognition (psychology) , machine learning , protein structure , data mining , biochemistry , chemistry , physics , nuclear magnetic resonance , operating system
Protein structure alignment algorithms are often time-consuming, resulting in challenges for large-scale protein structure similarity-based retrieval. There is an urgent need for more efficient structure comparison approaches as the number of protein structures increases rapidly. In this paper, we propose an effective graph-based protein structure representation learning method, GraSR, for fast and accurate structure comparison. In GraSR, a graph is constructed based on the intra-residue distance derived from the tertiary structure. Then, deep graph neural networks (GNNs) with a short-cut connection learn graph representations of the tertiary structures under a contrastive learning framework. To further improve GraSR, a novel dynamic training data partition strategy and length-scaling cosine distance are introduced. We objectively evaluate our method GraSR on SCOPe v2.07 and a new released independent test set from PDB database with a designed comprehensive performance metric. Compared with other state-of-the-art methods, GraSR achieves about 7%-10% improvement on two benchmark datasets. GraSR is also much faster than alignment-based methods. We dig into the model and observe that the superiority of GraSR is mainly brought by the learned discriminative residue-level and global descriptors. The web-server and source code of GraSR are freely available at www.csbio.sjtu.edu.cn/bioinf/GraSR/ for academic use.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here