
Overview of Long-form Document Matching: Survey of Existing Models and Their Challenges
Author(s) -
Yaokai Cheng,
Ruoyu Chen,
Xiaoguang Yuan,
Yuting Yang,
Shan Jiang,
Bo Yang
Publication year - 2022
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/2171/1/012059
Subject(s) - matching (statistics) , computer science , information retrieval , document clustering , cluster analysis , field (mathematics) , natural language processing , artificial intelligence , mathematics , statistics , pure mathematics
Long-form document matching is an important direction in the field of natural language processing and can be applied to tasks such as news recommendation and text clustering. However, long-form document matching suffers from noisiness and sparsity of semantic information in long text. Using short-form document matching methods on a long-form matching problem is not satisfactory. Long-form document matching has attracted the attention of researchers, who have proposed many effective methods. Methods for matching long texts can be divided into three categories: traditional bag-of-words-based models, traditional deep learning-based models, and pre-training-based models. This study reviews typical methods of long-form document matching, analyzes their advantages and disadvantages, and discusses possible future developments.