z-logo
Premium
A natural language processing approach based on embedding deep learning from heterogeneous compounds for quantitative structure–activity relationship modeling
Author(s) -
Bouhedjar Khalid,
Boukelia Abdelbasset,
Khorief Nacereddine Abdelmalek,
Boucheham Anouar,
Belaidi Amine,
Djerourou Abdelhafid
Publication year - 2020
Publication title -
chemical biology and drug design
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.59
H-Index - 77
eISSN - 1747-0285
pISSN - 1747-0277
DOI - 10.1111/cbdd.13742
Subject(s) - computer science , embedding , deep learning , artificial intelligence , convolutional neural network , word embedding , machine learning , artificial neural network , semantics (computer science) , word (group theory) , programming language , mathematics , geometry
Over the past decade, rapid development in biological and chemical technologies such as high-throughput screening, parallel synthesis, has been significantly increased the amount of data, which requires the creation and the integration of new analytical methods, especially deep learning models. Recently, there is an increasing interest in deep learning utilization in computer-aided drug discovery due to its exceptional successful application in many fields. The present work proposed a natural language processing approach, based on embedding deep neural networks. Our method aims to transform the Simplified Molecular Input Line Entry System format into word embedding vectors to represent the semantics of compounds. These vectors are fed into supervised machine learning algorithms such as convolutional long short-term memory neural network, support vector machine, and random forest to build up quantitative structure-activity relationship models on toxicity data sets. The obtained results on toxicity data to the ciliate Tetrahymena pyriformis (IGC 50 ), and acute toxicity rat data expressed as median lethal dose of treated rats (LD 50 ) show that our approach can eventually be used to predict the activities of chemical compounds efficiently. All material used in this study is available online through the GitHub portal (https://github.com/BoukeliaAbdelbasset/NLPDeepQSAR.git).

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here