Prediction of interactiveness between small molecules and enzymes by combining gene ontology and compound similarity | Zendy

Chen Lei | Zendy; Qian Ziliang | Zendy; Fen Kaiyan | Zendy; Cai Yudong | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Prediction of interactiveness between small molecules and enzymes by combining gene ontology and compound similarity

Author(s) -

Chen Lei,

Qian Ziliang,

Fen Kaiyan,

Cai Yudong

Publication year - 2010

Publication title -

journal of computational chemistry

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.907

H-Index - 188

eISSN - 1096-987X

pISSN - 0192-8651

DOI - 10.1002/jcc.21467

Subject(s) - similarity (geometry) , computer science , ontology , data mining , gene ontology , chemistry , artificial intelligence , gene , biochemistry , gene expression , image (mathematics) , philosophy , epistemology

Determination of whether a small organic molecule interacts with an enzyme can help to understand the molecular and cellular functions of organisms, and the metabolic pathways. In this research, we present a prediction model, by combining compound similarity and enzyme similarity, to predict the interactiveness between small molecules and enzymes. A dataset consisting of 2859 positive couples of small molecule and enzyme and 286,056 negative couples was employed. Compound similarity is a measurement of how similar two small molecules are, proposed by Hattori et al., J Am Chem Soc 2003, 125, 11853 which can be availed at http://www.genome.jp/ligand-bin/search_compound, while enzyme similarity was obtained by three ways, they are blast method, using gene ontology items and functional domain composition. Then a new distance between a pair of couples was established and nearest neighbor algorithm (NNA) was employed to predict the interactiveness of enzymes and small molecules. A data distribution strategy was adopted to get a better data balance between the positive samples and the negative samples during training the prediction model, by singling out one-fourth couples as testing samples and dividing the rest data into seven training datasets-the rest positive samples were added into each training dataset while only the negative samples were divided. In this way, seven NNAs were built. Finally, simple majority voting system was applied to integrate these seven models to predict the testing dataset, which was demonstrated to have better prediction results than using any single prediction model. As a result, the highest overall prediction accuracy achieved 97.30%.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research