Improved biomedical document retrieval system with PubMed term statistics and expansions
Author(s) -
Huian Li,
Jake Y. Chen
Publication year - 2009
Publication title -
international journal of computational intelligence in bioinformatics and systems biology
Language(s) - English
Resource type - Journals
eISSN - 1755-8042
pISSN - 1755-8034
DOI - 10.1504/ijcibsb.2009.024052
Subject(s) - term (time) , information retrieval , medical statistics , computer science , statistics , mathematics , physics , quantum mechanics
Large biomedical abstract databases such as MEDLINE enable users to search for large bodies of biomedical knowledge quickly. In this study, we describe a new framework to improve the performance of MEDLINE document retrieval. We first analysed and built a normalized term frequency distributions for 1.8 million terms by sampling from 1,500,000 MEDLINE abstracts. Then, we developed a statistical model to identify significantly observed terms ('gists') in a document as additional document keywords to help improve document retrieval precisions. To improve document recalls, we integrated several biological ontologies that can expand user queries with semantically compatible terms. The framework was implemented in Oracle 10g.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom