z-logo
Premium
KFC2: A knowledge‐based hot spot prediction method based on interface solvation, atomic density, and plasticity features
Author(s) -
Zhu Xiaolei,
Mitchell Julie C.
Publication year - 2011
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.23094
Subject(s) - hot spot (computer programming) , support vector machine , false positive rate , solvation , computer science , test set , feature (linguistics) , set (abstract data type) , training set , pattern recognition (psychology) , artificial intelligence , false discovery rate , biological system , data mining , chemistry , solvent , biology , linguistics , philosophy , biochemistry , organic chemistry , gene , programming language , operating system
Hot spots constitute a small fraction of protein–protein interface residues, yet they account for a large fraction of the binding affinity. Based on our previous method (KFC), we present two new methods (KFC2a and KFC2b) that outperform other methods at hot spot prediction. A number of improvements were made in developing these new methods. First, we created a training data set that contained a similar number of hot spot and non‐hot spot residues. In addition, we generated 47 different features, and different numbers of features were used to train the models to avoid over‐fitting. Finally, two feature combinations were selected: One (used in KFC2a) is composed of eight features that are mainly related to solvent accessible surface area and local plasticity; the other (KFC2b) is composed of seven features, only two of which are identical to those used in KFC2a. The two models were built using support vector machines (SVM). The two KFC2 models were then tested on a mixed independent test set, and compared with other methods such as Robetta, FOLDEF, HotPoint, MINERVA, and KFC. KFC2a showed the highest predictive accuracy for hot spot residues (True Positive Rate: TPR = 0.85); however, the false positive rate was somewhat higher than for other models. KFC2b showed the best predictive accuracy for hot spot residues (True Positive Rate: TPR = 0.62) among all methods other than KFC2a, and the False Positive Rate (FPR = 0.15) was comparable with other highly predictive methods. Proteins 2011. © 2011 Wiley‐Liss, Inc.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here