Mining hidden knowledge: embedding models of cause–effect relationships curated from the biomedical literature | Zendy

A. Krämer | Zendy; Jeff Green | Zendy; Jean-Noël Billaud | Zendy; Nicoleta Andreea Pasare | Zendy; Martin Jones | Zendy; Stuart Tugendreich | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Mining hidden knowledge: embedding models of cause–effect relationships curated from the biomedical literature

Author(s) -

A. Krämer,

Jeff Green,

Jean-Noël Billaud,

Nicoleta Andreea Pasare,

Martin Jones,

Stuart Tugendreich

Publication year - 2022

Publication title -

bioinformatics advances

Language(s) - English

Resource type - Journals

ISSN - 2635-0041

DOI - 10.1093/bioadv/vbac022

Subject(s) - computer science , embedding , inference , computational biology , machine learning , function (biology) , python (programming language) , artificial intelligence , context (archaeology) , gene regulatory network , biological network , gene , data mining , biology , gene expression , genetics , paleontology , operating system

Motivation We explore the use of literature-curated signed causal gene expression and gene–function relationships to construct unsupervised embeddings of genes, biological functions and diseases. Our goal is to prioritize and predict activating and inhibiting functional associations of genes and to discover hidden relationships between functions. As an application, we are particularly interested in the automatic construction of networks that capture relevant biology in a given disease context. Results We evaluated several unsupervised gene embedding models leveraging literature-curated signed causal gene expression findings. Using linear regression, we show that, based on these gene embeddings, gene–function relationships can be predicted with about 95% precision for the highest scoring genes. Function embedding vectors, derived from parameters of the linear regression model, allow inference of relationships between different functions or diseases. We show for several diseases that gene and function embeddings can be used to recover key drivers of pathogenesis, as well as underlying cellular and physiological processes. These results are presented as disease-centric networks of genes and functions. To illustrate the applicability of our approach to other machine learning tasks, we also computed embeddings for drug molecules, which were then tested using a simple neural network to predict drug–disease associations. Availability and implementation Python implementations of the gene and function embedding algorithms operating on a subset of our literature-curated content as well as other code used for this paper are made available as part of the Supplementary data. Supplementary information Supplementary data are available at Bioinformatics Advances online.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research