z-logo
open-access-imgOpen Access
SciDeBERTa: Learning DeBERTa for Science Technology Documents and Fine-Tuning Information Extraction Tasks
Author(s) -
Yuna Jeong,
Eunhui Kim
Publication year - 2022
Publication title -
ieee access
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.587
H-Index - 127
ISSN - 2169-3536
DOI - 10.1109/access.2022.3180830
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Deep learning-based language models (LMs) have transcended the gold standard (human baseline) of SQuAD 1.1 and GLUE benchmarks in April and July 2019, respectively. As of 2022, the top five LMs on the SuperGLUE benchmark leaderboard have exceeded the gold standard. Even people with good general knowledge will struggle to solve problems in specialized fields such as medicine and artificial intelligence. Just as humans learn specialized knowledge through bachelor’s, master’s, and doctoral courses, LMs also require a process to develop the ability to understand domain specific knowledge. Thus, this study proposes SciDeBERTa and SciDeBERTa(CS) as a pre-trained LM (PLM) specialized in the science technology domain. We further pretrain the DeBERTa, which was trained with a general corpus, with the science technology domain corpus. Experiments verified that SciDeBERTa(CS) continually pre-trained in the computer science domain achieved 3.53% and 2.17% higher accuracies than SciBERT and S2ORC-SciBERT, respectively, which are science technology domain specialized PLMs, in the task of recognizing entity names in SciERC dataset. In the JRE task of the SciERC dataset, SciDeBERTa(CS) demonstrated a 6.7% higher performance than baseline SCIIE. In the Genia dataset, SciDeBERTa achieved the best performance compared to S2ORC-SciBERT, SciBERT, BERT, DeBERTa and SciDeBERTa(CS). Furthermore, re-initialization technology and optimizers after Adam were explored during fine-tuning to verify the language understanding of PLMs.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here