Premium
Word embeddings for biomedical natural language processing: A survey
Author(s) -
Chiu Billy,
Baker Simon
Publication year - 2020
Publication title -
language and linguistics compass
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.619
H-Index - 44
ISSN - 1749-818X
DOI - 10.1111/lnc3.12402
Subject(s) - computer science , word (group theory) , word embedding , natural language processing , artificial intelligence , embedding , domain (mathematical analysis) , biomedicine , biomedical text mining , linguistics , text mining , mathematics , mathematical analysis , philosophy , biology , genetics
Abstract Word representations are mathematical objects that capture the semantic and syntactic properties of words in a way that is interpretable by machines. Recently, encoding word properties into low‐dimensional vector spaces using neural networks has become increasingly popular. Word embeddings are now used as the main input to natural language processing (NLP) applications, achieving cutting‐edge results. Nevertheless, most word‐embedding studies are carried out with general‐domain text and evaluation datasets, and their results do not necessarily apply to text from other domains (e.g., biomedicine) that are linguistically distinct from general English. To achieve maximum benefit when using word embeddings for biomedical NLP tasks, they need to be induced and evaluated using in‐domain resources. Thus, it is essential to create a detailed review of biomedical embeddings that can be used as a reference for researchers to train in‐domain models. In this paper, we review biomedical word embedding studies from three key aspects: the corpora, models and evaluation methods. We first describe the characteristics of various biomedical corpora, and then compare popular embedding models. After that, we discuss different evaluation methods for biomedical embeddings. For each aspect, we summarize the various challenges discussed in the literature. Finally, we conclude the paper by proposing future directions that will help advance research into biomedical embeddings.