Premium
Cross‐Mapping of Protein – Ligand Binding Data Between ChEMBL and PDBbind
Author(s) -
Liu Zhihai,
Li Jie,
Liu Jie,
Liu Yuchen,
Nie Wei,
Han Li,
Li Yan,
Wang Renxiao
Publication year - 2015
Publication title -
molecular informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.481
H-Index - 68
eISSN - 1868-1751
pISSN - 1868-1743
DOI - 10.1002/minf.201500010
Subject(s) - chembl , protein ligand , computer science , computational biology , database , drug discovery , chemistry , bioinformatics , biology , biochemistry
The ChEMBL database is a valuable open data source, which provides a comprehensive collection of binding data, functional and ADMET properties of bioactive compounds. The PDBbind database has a more focused scope, i.e. collecting binding data for the protein‐ligand complexes in the Protein Data Bank. Currently, the PDBbind collection of binding data is rather modest as compared to the ChEMBL collection (∼13 000 versus ∼1.3 million). One may suspect if the former is actually a subset of the latter. In this study, we mapped the molecular information and protein‐ligand binding data in PDBbind to the records in ChEMBL, and then analyzed the overlap between the binding data recorded in these two databases. Our results indicate that only ∼20 % of the binding data in PDBbind can find their counterparts in ChEMBL. Thus, the PDBbind collection of binding data is largely complementary to the ChEMBL collection. We also reveal two reasons accounting for the low overlap between two databases: First, only a minor fraction of the protein‐ligand complexes in PDBbind is covered by ChEMBL; Second, the literature spaces screened by these two databases do not have a substantial overlap either. The value of focused databases versus more comprehensive ones is demonstrated by our study.