Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion | Zendy

Chunyu Wang | Zendy; Jie Zhang | Zendy; XuePing Wang | Zendy; Ke Han | Zendy; Maozu Guo | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion

Author(s) -

Chunyu Wang,

Jie Zhang,

XuePing Wang,

Ke Han,

Maozu Guo

Publication year - 2020

Publication title -

frontiers in genetics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.413

H-Index - 81

ISSN - 1664-8021

DOI - 10.3389/fgene.2020.00005

Subject(s) - fusion , information fusion , computer science , algorithm , fusion gene , gene , computational biology , artificial intelligence , biology , genetics , philosophy , linguistics

Complex diseases seriously affect people's physical and mental health. The discovery of disease-causing genes has become a target of research. With the emergence of bioinformatics and the rapid development of biotechnology, to overcome the inherent difficulties of the long experimental period and high cost of traditional biomedical methods, researchers have proposed many gene prioritization algorithms that use a large amount of biological data to mine pathogenic genes. However, because the currently known gene–disease association matrix is still very sparse and lacks evidence that genes and diseases are unrelated, there are limits to the predictive performance of gene prioritization algorithms. Based on the hypothesis that functionally related gene mutations may lead to similar disease phenotypes, this paper proposes a PU induction matrix completion algorithm based on heterogeneous information fusion (PUIMCHIF) to predict candidate genes involved in the pathogenicity of human diseases. On the one hand, PUIMCHIF uses different compact feature learning methods to extract features of genes and diseases from multiple data sources, making up for the lack of sparse data. On the other hand, based on the prior knowledge that most of the unknown gene–disease associations are unrelated, we use the PU-Learning strategy to treat the unknown unlabeled data as negative examples for biased learning. The experimental results of the PUIMCHIF algorithm regarding the three indexes of precision, recall, and mean percentile ranking (MPR) were significantly better than those of other algorithms. In the top 100 global prediction analysis of multiple genes and multiple diseases, the probability of recovering true gene associations using PUIMCHIF reached 50% and the MPR value was 10.94%. The PUIMCHIF algorithm has higher priority than those from other methods, such as IMC and CATAPULT.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research