
Improving domain adaptation in de-identification of electronic health records through self-training
Author(s) -
S. Matthew Liao,
Jamie Kiros,
Jian-Zhang Chen,
Zhaolei Zhang,
Ting Chen
Publication year - 2021
Publication title -
journal of the american medical informatics association
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.614
H-Index - 150
eISSN - 1527-974X
pISSN - 1067-5027
DOI - 10.1093/jamia/ocab128
Subject(s) - computer science , identification (biology) , software deployment , domain (mathematical analysis) , domain adaptation , artificial intelligence , task (project management) , machine learning , adaptation (eye) , test data , data mining , classifier (uml) , mathematical analysis , botany , mathematics , biology , physics , management , optics , economics , programming language , operating system
De-identification is a fundamental task in electronic health records to remove protected health information entities. Deep learning models have proven to be promising tools to automate de-identification processes. However, when the target domain (where the model is applied) is different from the source domain (where the model is trained), the model often suffers a significant performance drop, commonly referred to as domain adaptation issue. In de-identification, domain adaptation issues can make the model vulnerable for deployment. In this work, we aim to close the domain gap by leveraging unlabeled data from the target domain.