Premium
Physicochemical descriptors to discriminate protein–protein interactions in permanent and transient complexes selected by means of machine learning algorithms
Author(s) -
Block Peter,
Paern Juri,
Hüllermeier Eyke,
Sanschagrin Paul,
Sotriffer Christoph A.,
Klebe Gerhard
Publication year - 2006
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.21104
Subject(s) - support vector machine , artificial intelligence , machine learning , naive bayes classifier , computer science , algorithm , protein–protein interaction , protein structure , protein sequencing , pattern recognition (psychology) , chemistry , peptide sequence , biochemistry , gene
Analyzing protein–protein interactions at the atomic level is critical for our understanding of the principles governing the interactions involved in protein–protein recognition. For this purpose, descriptors explaining the nature of different protein–protein complexes are desirable. In this work, the authors introduced Epic Protein Interface Classification as a framework handling the preparation, processing, and analysis of protein–protein complexes for classification with machine learning algorithms. We applied four different machine learning algorithms: Support Vector Machines, C4.5 Decision Trees, K Nearest Neighbors, and Naïve Bayes algorithm in combination with three feature selection methods, Filter (Relief F), Wrapper, and Genetic Algorithms, to extract discriminating features from the protein–protein complexes. To compare protein–protein complexes to each other, the authors represented the physicochemical characteristics of their interfaces in four different ways, using two different atomic contact vectors, DrugScore pair potential vectors and SFCscore descriptor vectors. We classified two different datasets: (A) 172 protein–protein complexes comprising 96 monomers, forming contacts enforced by the crystallographic packing environment (crystal contacts), and 76 biologically functional homodimer complexes; (B) 345 protein–protein complexes containing 147 permanent complexes and 198 transient complexes. We were able to classify up to 94.8% of the packing enforced/functional and up to 93.6% of the permanent/transient complexes correctly. Furthermore, we were able to extract relevant features from the different protein–protein complexes and introduce an approach for scoring the importance of the extracted features. Proteins 2006. © 2006 Wiley‐Liss, Inc.