Efficient Unsupervised Semantic Similarity through Dual-Masked Prompting
Author(s) -
Huilai Zou,
Minfeng Lu,
Qifei Zhang
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3611338
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Unsupervised text semantic similarity is an important task in natural language processing that aims to learn robust text representations without labeled data. A major challenge in this domain, particularly for contrastive learning methods, is efficiently generating a sufficient number of high-quality positive and negative samples. Existing approaches like SimCSE and PromptBERT have limitations in sample generation efficiency and volume. To address these limitations, we propose a novel dual-mask prompt template strategy that greatly increases the quantity and efficiency of positive and negative sample generation. Our method uniquely allows each template to simultaneously produce a positive and a negative sample. Furthermore, to eliminate noise interference caused by prompt templates, we introduce a simple difference-based noise separation technique. Concurrently, we extend the InfoNCE loss function to optimize the learning of feature space distributions. We used the SentEval toolkit to evaluate our method on seven standard text similarity datasets. The results show that, without any external data augmentation, our method achieves an average Spearman correlation coefficient of 78.52, performing comparably to or surpassing some classical methods. Comprehensive ablation studies rigorously validate the individual effectiveness of our dual-mask prompting, noise separation strategy, and extended loss function.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom