z-logo
open-access-imgOpen Access
METHOD FOR DETERMINING THE SEMANTIC SIMILARITY OF ARBITRARY LENGTH TEXTS USING THE TRANSFORMERS MODELS
Author(s) -
Сергій Олізаренко,
Viacheslav Radchenko
Publication year - 2021
Publication title -
sučasnì ìnformacìjnì sistemi
Language(s) - English
Resource type - Journals
ISSN - 2522-9052
DOI - 10.20998/2522-9052.2021.2.18
Subject(s) - transformer , computer science , sentence , semantic similarity , similarity (geometry) , artificial intelligence , sequence (biology) , natural language processing , algorithm , pattern recognition (psychology) , voltage , physics , quantum mechanics , biology , image (mathematics) , genetics
The paper considers the results of a method development for determining the semantic similarity of arbitrary length texts based on their vector representations. These vector representations are obtained via multilingual Transformers model usage, and direct problem of determining semantic similarity of arbitrary length texts is considered as the text sequence pairs classification problem using Transformers model. Comparative analysis of the most optimal Transformers model for solving such class of problems was performed. Considered in this case main stages of the method are: Transformers model fine-tuning stage in the framework of pretrained model second problem (sentence prediction), also selection and implementation stage of the summarizing method for text sequence more than 512 (1024) tokens long to solve the problem of determining the semantic similarity for arbitrary length texts.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here