Data Mining Approach for Extraction of Useful Information About Biologically Active Compounds from Publications | Zendy

Olga Tarasova | Zendy; Nadezhda Biziukova | Zendy; Dmitry Filimonov | Zendy; Vladimir Poroikov | Zendy; Marc C. Nicklaus | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Data Mining Approach for Extraction of Useful Information About Biologically Active Compounds from Publications

Author(s) -

Olga Tarasova,

Nadezhda Biziukova,

Dmitry Filimonov,

Vladimir Poroikov,

Marc C. Nicklaus

Publication year - 2019

Publication title -

journal of chemical information and modeling

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.24

H-Index - 160

eISSN - 1549-960X

pISSN - 1549-9596

DOI - 10.1021/acs.jcim.9b00164

Subject(s) - computer science , biological data , categorization , drug discovery , process (computing) , biological activity , data extraction , data mining , information retrieval , quality (philosophy) , bioassay , experimental data , data science , artificial intelligence , chemistry , bioinformatics , biology , mathematics , medline , biochemistry , philosophy , genetics , epistemology , in vitro , operating system , statistics

A lot of high quality data on the biological activity of chemical compounds are required throughout the whole drug discovery process: from development of computational models of the structure-activity relationship to experimental testing of lead compounds and their validation in clinics. Currently, a large amount of such data is available from databases, scientific publications, and patents. Biological data are characterized by incompleteness, uncertainty, and low reproducibility. Despite the existence of free and commercially available databases of biological activities of compounds, they usually lack unambiguous information about peculiarities of biological assays. On the other hand, scientific papers are the primary source of new data disclosed to the scientific community for the first time. In this study, we have developed and validated a data-mining approach for extraction of text fragments containing description of bioassays. We have used this approach to evaluate compounds and their biological activity reported in scientific publications. We have found that categorization of papers into relevant and irrelevant may be performed based on the machine-learning analysis of the abstracts. Text fragments extracted from the full texts of publications allow their further partitioning into several classes according to the peculiarities of bioassays. We demonstrate the applicability of our approach to the comparison of the endpoint values of biological activity and cytotoxicity of reference compounds.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research