Open Access
Document Retrieval for Precision Medicine Using a Deep Learning Ensemble Method
Author(s) -
Zhiqiang Liu,
Jingkun Feng,
Zhihao Yang,
Lei Wang
Publication year - 2021
Publication title -
jmir medical informatics
Language(s) - English
Resource type - Journals
ISSN - 2291-9694
DOI - 10.2196/28272
Subject(s) - computer science , information retrieval , query expansion , relevance (law) , ranking (information retrieval) , boosting (machine learning) , learning to rank , document retrieval , matching (statistics) , relevance feedback , context (archaeology) , search engine , data mining , artificial intelligence , image retrieval , paleontology , statistics , mathematics , biology , political science , law , image (mathematics)
Background With the development of biomedicine, the number of biomedical documents has increased rapidly bringing a great challenge for researchers trying to retrieve the information they need. Information retrieval aims to meet this challenge by searching relevant documents from abundant documents based on the given query. However, sometimes the relevance of search results needs to be evaluated from multiple aspects in specific retrieval tasks, thereby increasing the difficulty of biomedical information retrieval. Objective This study aimed to find a more systematic method for retrieving relevant scientific literature for a given patient. Methods In the initial retrieval stage, we supplemented query terms through query expansion strategies and applied query boosting to obtain an initial ranking list of relevant documents. In the re-ranking phase, we employed a text classification model and relevance matching model to evaluate documents from different dimensions and then combined the outputs through logistic regression to re-rank all the documents from the initial ranking list. Results The proposed ensemble method contributed to the improvement of biomedical retrieval performance. Compared with the existing deep learning–based methods, experimental results showed that our method achieved state-of-the-art performance on the data collection provided by the Text Retrieval Conference 2019 Precision Medicine Track. Conclusions In this paper, we proposed a novel ensemble method based on deep learning. As shown in the experiments, the strategies we used in the initial retrieval phase such as query expansion and query boosting are effective. The application of the text classification model and relevance matching model better captured semantic context information and improved retrieval performance.