z-logo
Premium
Validation and extraction of molecular‐geometry information from small‐molecule databases
Author(s) -
Long Fei,
Nicholls Robert A.,
Emsley Paul,
Gražulis Saulius,
Merkys Andrius,
Vaitkus Antanas,
Murshudov Garib N.
Publication year - 2017
Publication title -
acta crystallographica section d
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 7.374
H-Index - 138
ISSN - 2059-7983
DOI - 10.1107/s2059798317000079
Subject(s) - database , consistency (knowledge bases) , atom (system on chip) , algorithm , molecule , molecular geometry , bond length , computer science , crystallography , chemistry , crystal structure , geometry , data mining , mathematics , organic chemistry , embedded system
A freely available small‐molecule structure database, the Crystallography Open Database (COD), is used for the extraction of molecular‐geometry information on small‐molecule compounds. The results are used for the generation of new ligand descriptions, which are subsequently used by macromolecular model‐building and structure‐refinement software. To increase the reliability of the derived data, and therefore the new ligand descriptions, the entries from this database were subjected to very strict validation. The selection criteria made sure that the crystal structures used to derive atom types, bond and angle classes are of sufficiently high quality. Any suspicious entries at a crystal or molecular level were removed from further consideration. The selection criteria included (i) the resolution of the data used for refinement (entries solved at 0.84 Å resolution or higher) and (ii) the structure‐solution method (structures must be from a single‐crystal experiment and all atoms of generated molecules must have full occupancies), as well as basic sanity checks such as (iii) consistency between the valences and the number of connections between atoms, (iv) acceptable bond‐length deviations from the expected values and (v) detection of atomic collisions. The derived atom types and bond classes were then validated using high‐order moment‐based statistical techniques. The results of the statistical analyses were fed back to fine‐tune the atom typing. The developed procedure was repeated four times, resulting in fine‐grained atom typing, bond and angle classes. The procedure will be repeated in the future as and when new entries are deposited in the COD. The whole procedure can also be applied to any source of small‐molecule structures, including the Cambridge Structural Database and the ZINC database.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here