Premium
Refinement of pairwise potentials via logistic regression to score protein‐protein interactions
Author(s) -
Tanemura Kiyoto A.,
Pei Jun,
Merz Kenneth M.
Publication year - 2020
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.25973
Subject(s) - decoy , computer science , pairwise comparison , classifier (uml) , artificial intelligence , machine learning , docking (animal) , random forest , logistic regression , protein structure prediction , data mining , protein structure , biology , medicine , biochemistry , receptor , nursing
Protein-protein interactions (PPIs) are ubiquitous and functionally of great importance in biological systems. Hence, the accurate prediction of PPIs by protein-protein docking and scoring tools is highly desirable in order to characterize their structure and biological function. Ab initio docking protocols are divided into the sampling of docking poses to produce at least one near-native structure, and then to evaluate the vast candidate structures by scoring. Concurrent development in both sampling and scoring is crucial for the deployment of protein-protein docking software. In the present work, we apply a machine learning model on pairwise potentials to refine the task of protein quaternary structure native structure detection among decoys. A decoy set was featurized using the Knowledge and Empirical Combined Scoring Algorithm 2 (KECSA2) pairwise potential. The highly unbalanced decoy set was then balanced using a comparison concept between native and decoy structures. The resultant comparison descriptors were used to train a logistic regression (LR) classifier. The LR model yielded the optimal performance for native detection among decoys compared with conventional scoring functions, while exhibiting lesser performance for the detection of low root mean square deviation decoy structures. Its deployment on an independent benchmark set confirms that the scoring function performs competitively relative to other scoring functions. The scripts used are available at https://github.com/TanemuraKiyoto/PPI-native-detection-via-LR.