z-logo
open-access-imgOpen Access
A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation
Author(s) -
Youcheng Pan,
Chenghao Wang,
Baotian Hu,
Yang Xiang,
Xiaolong Wang,
Qingcai Chen,
Junjie Chen,
Jingcheng Du
Publication year - 2021
Publication title -
jmir medical informatics
Language(s) - English
Resource type - Journals
ISSN - 2291-9694
DOI - 10.2196/32698
Subject(s) - computer science , sql , information retrieval , stored procedure , query by example , programming language , natural language processing , artificial intelligence , database , data mining , search engine , web search query
Background Electronic medical records (EMRs) are usually stored in relational databases that require SQL queries to retrieve information of interest. Effectively completing such queries can be a challenging task for medical experts due to the barriers in expertise. Existing text-to-SQL generation studies have not been fully embraced in the medical domain. Objective The objective of this study was to propose a neural generation model that can jointly consider the characteristics of medical text and the SQL structure to automatically transform medical texts to SQL queries for EMRs. Methods We proposed a medical text–to-SQL model (MedTS), which employed a pretrained Bidirectional Encoder Representations From Transformers model as the encoder and leveraged a grammar-based long short-term memory network as the decoder to predict the intermediate representation that can easily be transformed into the final SQL query. We adopted the syntax tree as the intermediate representation rather than directly regarding the SQL query as an ordinary word sequence, which is more in line with the tree-structure nature of SQL and can also effectively reduce the search space during generation. Experiments were conducted on the MIMICSQL dataset, and 5 competitor methods were compared. Results Experimental results demonstrated that MedTS achieved the accuracy of 0.784 and 0.899 on the test set in terms of logic form and execution, respectively, which significantly outperformed the existing state-of-the-art methods. Further analyses proved that the performance on each component of the generated SQL was relatively balanced and offered substantial improvements. Conclusions The proposed MedTS was effective and robust for improving the performance of medical text–to-SQL generation, indicating strong potential to be applied in the real medical scenario.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here