Identification of Diverse Database Subsets using Property‐Based and Fragment‐Based Molecular Descriptions
Author(s) -
Ashton Mark,
Barnard John,
Casset Florence,
Charlton Michael,
Downs Geoffrey,
Gorse Dominique,
Holliday John,
Lahana Roger,
Willett Peter
Publication year - 2002
Publication title -
quantitative structure‐activity relationships
Language(s) - English
Resource type - Journals
eISSN - 1521-3838
pISSN - 0931-8771
DOI - 10.1002/qsar.200290002
Subject(s) - fragment (logic) , selection (genetic algorithm) , property (philosophy) , identification (biology) , computer science , computational biology , cluster (spacecraft) , combinatorics , data mining , biology , mathematics , artificial intelligence , algorithm , programming language , philosophy , botany , epistemology
This paper reports a comparison of calculated molecular properties and of 2D fragment bit‐strings when used for the selection of structurally diverse subsets of a file of 44295 compounds. MaxMin dissimilarity‐based selection and k ‐means cluster‐based selection are used to select subsets containing between 1% and 20% of the file. Investigation of the numbers of bioactive molecules in the selected subsets suggest: that the MaxMin subsets are noticeably superior to the k ‐means subsets; that the property‐based descriptors are marginally superior to the fragment‐based descriptors; and that both approaches are noticeably superior to random selection.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom