
Сross-lingual transfer learning in drug-related information extraction from user-generated texts
Author(s) -
Andrey Sakhovskiy,
Elena Tutubalina
Publication year - 2021
Publication title -
trudy instituta sistemnogo programmirovaniâ ran/trudy instituta sistemnogo programmirovaniâ
Language(s) - English
Resource type - Journals
eISSN - 2220-6426
pISSN - 2079-8156
DOI - 10.15514/ispras-2021-33(6)-15
Subject(s) - computer science , natural language processing , task (project management) , named entity recognition , sentence , information extraction , transfer of learning , f1 score , artificial intelligence , macro , domain (mathematical analysis) , information retrieval , machine learning , mathematical analysis , mathematics , management , programming language , economics
Aggregating knowledge about drug, disease, and drug reaction entities across a broader range of domains and languages is critical for information extraction (IE) applications. In this work, we present a fine-grained evaluation intended to understand the efficiency of multilingual BERT-based models for biomedical named entity recognition (NER) and multi-label sentence classification tasks. We investigate the role of transfer learning (TL) strategies between two English corpora and a novel annotated corpus of Russian reviews about drug therapy. Labels for sentences include health-related issues or their absence. The sentences with one are additionally labelled at the expression level to identify fine-grained subtypes such as drug names, drug indications, and drug reactions. Evaluation results demonstrate that BERT trained on Russian and English raw reviews (5M in total) shows the best transfer capabilities on evaluation of adverse drug reactions on Russian data. The macro F1 score of 74.85% in the NER task was achieved by our RuDR-BERT model. For the classification task, our EnRuDR-BERT model achieves the macro F1 score of 70%, gaining 8.64% over the score of a general domain BERT model.