z-logo
open-access-imgOpen Access
Recognition of Disease Genetic Information from Unstructured Text Data Based on BiLSTM-CRF for Molecular Mechanisms
Author(s) -
Lejun Gong,
Xingxing Zhang,
Tianyin Chen,
Li Zhang
Publication year - 2021
Publication title -
security and communication networks
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.446
H-Index - 43
eISSN - 1939-0114
pISSN - 1939-0122
DOI - 10.1155/2021/6635027
Subject(s) - computer science , autism spectrum disorder , conditional random field , disease , autism , named entity recognition , artificial intelligence , kegg , biomedical text mining , natural language processing , machine learning , task (project management) , neurodevelopmental disorder , text mining , gene , medicine , biology , psychology , developmental psychology , pathology , biochemistry , gene expression , management , transcriptome , economics
Disease relevant entities are an important task in mining unstructured text data from the biomedical literature for achieving biomedical knowledge. Autism spectrum disorder (ASD) is a disease related to a neurological and developmental disorder characterized by deficits in communication and social interaction and by repetitive behaviour. However, this kind of disease remains unclear to date. In this study, it identifies entities associated with disease using the machine learning of a computational way from text data collection for molecular mechanisms related to ASD. Entities related to disease are extracted from the biomedical literature related to autism by using deep learning with bidirectional long short-term memory (BiLSTM) and conditional random field (CRF) model. Compared other previous works, the approach is promising for identifying entities related to disease. .e proposed approach including five types of molecular entities is evaluated by GENIA corpus to obtain an F-score of 76.81%. .e work has extracted 9146 proteins, 145 RNAs, 7680 DNAs, 1058 cell-types, and 981 cell-lines from the autism biomedical literature after removing repeated molecular entities. Finally, we perform GO and KEGG analyses of the test dataset. .is study could serve as a reference for further studies on the etiology of disease on the basis of molecular mechanisms and provide a way to explore disease genetic information.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom