Premium
Towards Proteome–Wide Interaction Models Using the Proteochemometrics Approach
Author(s) -
Strömbergsson Helena,
Lapins Maris,
Kleywegt Gerard J.,
Wikberg Jarl E. S.
Publication year - 2010
Publication title -
molecular informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.481
H-Index - 68
eISSN - 1868-1751
pISSN - 1868-1743
DOI - 10.1002/minf.201000052
Subject(s) - quantitative structure–activity relationship , proteome , ligand (biochemistry) , protein ligand , protein–protein interaction , cheminformatics , computational biology , computer science , interaction model , chemistry , interaction information , biological system , artificial intelligence , machine learning , computational chemistry , biology , mathematics , biochemistry , statistics , receptor , world wide web
A proteochemometrics model was induced from all interaction data in the BindingDB database, comprizing in all 7078 protein‐ligand complexes with representatives from all major drug target categories. Proteins were represented by alignment‐independent sequence descriptors holding information on properties such as hydrophobicity, charge, and secondary structure. Ligands were represented by commonly used QSAR descriptors. The inhibition constant (p K i ) values of protein‐ligand complexes were discretized into “high” and “low” interaction activity. Different machine‐learning techniques were used to induce models relating protein and ligand properties to the interaction activity. The best was decision trees, which gave an accuracy of 80 % and an area under the ROC curve of 0.81. The tree pointed to the protein and ligand properties, which are relevant for the interaction. As the approach does neither require alignments nor knowledge of protein 3D structures virtually all available protein‐ligand interaction data could be utilized, thus opening a way to completely general interaction models that may span entire proteomes.