Corpus domain effects on distributional semantic modeling of medical terms
Author(s) -
Serguei Pakhomov,
Greg P. Finley,
Reed McEwan,
Yan Wang,
Genevieve B. Melton
Publication year - 2016
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btw529
Subject(s) - computer science , semantic similarity , benchmark (surveying) , semantics (computer science) , information retrieval , distributional semantics , natural language processing , similarity (geometry) , task (project management) , artificial intelligence , unified medical language system , domain (mathematical analysis) , biomedical text mining , text mining , mathematical analysis , mathematics , management , geodesy , economics , image (mathematics) , programming language , geography
Automatically quantifying semantic similarity and relatedness between clinical terms is an important aspect of text mining from electronic health records, which are increasingly recognized as valuable sources of phenotypic information for clinical genomics and bioinformatics research. A key obstacle to development of semantic relatedness measures is the limited availability of large quantities of clinical text to researchers and developers outside of major medical centers. Text from general English and biomedical literature are freely available; however, their validity as a substitute for clinical domain to represent semantics of clinical terms remains to be demonstrated.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom