A natural language processing approach based on embedding deep learning from heterogeneous compounds for quantitative structure–activity relationship modeling | Zendy

Bouhedjar Khalid | Zendy; Boukelia Abdelbasset | Zendy; Khorief Nacereddine Abdelmalek | Zendy; Boucheham Anouar | Zendy; Belaidi Amine | Zendy; Djerourou Abdelhafid | Zendy

Premium

A natural language processing approach based on embedding deep learning from heterogeneous compounds for quantitative structure–activity relationship modeling

Author(s) -

Bouhedjar Khalid,

Boukelia Abdelbasset,

Khorief Nacereddine Abdelmalek,

Boucheham Anouar,

Belaidi Amine,

Djerourou Abdelhafid

Publication year - 2020

Publication title -

chemical biology and drug design

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.59

H-Index - 77

eISSN - 1747-0285

pISSN - 1747-0277

DOI - 10.1111/cbdd.13742

Subject(s) - computer science , embedding , deep learning , artificial intelligence , convolutional neural network , word embedding , machine learning , artificial neural network , semantics (computer science) , word (group theory) , programming language , mathematics , geometry

Over the past decade, rapid development in biological and chemical technologies such as high-throughput screening, parallel synthesis, has been significantly increased the amount of data, which requires the creation and the integration of new analytical methods, especially deep learning models. Recently, there is an increasing interest in deep learning utilization in computer-aided drug discovery due to its exceptional successful application in many fields. The present work proposed a natural language processing approach, based on embedding deep neural networks. Our method aims to transform the Simplified Molecular Input Line Entry System format into word embedding vectors to represent the semantics of compounds. These vectors are fed into supervised machine learning algorithms such as convolutional long short-term memory neural network, support vector machine, and random forest to build up quantitative structure-activity relationship models on toxicity data sets. The obtained results on toxicity data to the ciliate Tetrahymena pyriformis (IGC 50 ), and acute toxicity rat data expressed as median lethal dose of treated rats (LD 50 ) show that our approach can eventually be used to predict the activities of chemical compounds efficiently. All material used in this study is available online through the GitHub portal (https://github.com/BoukeliaAbdelbasset/NLPDeepQSAR.git).

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research