Formatting biological big data for modern machine learning in drug discovery | Zendy

DuranFrigola Miquel | Zendy; FernándezTorras Adrià | Zendy; Bertoni Martino | Zendy; Aloy Patrick | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Formatting biological big data for modern machine learning in drug discovery

Author(s) -

DuranFrigola Miquel,

FernándezTorras Adrià,

Bertoni Martino,

Aloy Patrick

Publication year - 2018

Publication title -

wiley interdisciplinary reviews: computational molecular science

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 5.126

H-Index - 81

eISSN - 1759-0884

pISSN - 1759-0876

DOI - 10.1002/wcms.1408

Subject(s) - cheminformatics , computer science , drug discovery , biological data , machine learning , artificial intelligence , biological network , data science , implementation , pipeline (software) , bioinformatics , software engineering , programming language , biology

Biological data is accumulating at an unprecedented rate, escalating the role of data‐driven methods in computational drug discovery. This scenario is favored by recent advances in machine learning algorithms, which are optimized for huge datasets and consistently beat the predictive performance of previous art, rapidly approaching human expert reasoning. The urge to couple biological data to cutting‐edge machine learning has spurred developments in data integration and knowledge representation, especially in the form of heterogeneous, multiplex and semantically‐rich biological networks. Today, thanks to the propitious rise in knowledge embedding techniques, these large and complex biological networks can be converted to a vector format that suits the majority of machine learning implementations. Here, we explain why this can be particularly transformative for drug discovery where, for decades, customary chemoinformatics methods have employed vector descriptors of compound structures as the standard input of their prediction tasks. A common vector format to represent biology and chemistry may push biological information into most of the existing steps of the drug discovery pipeline, boosting the accuracy of predictions and uncovering connections between small molecules and other biological entities such as targets or diseases. This article is categorized under: Computer and Information Science > Databases and Expert Systems Computer and Information Science > Chemoinformatics

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research