CMedMi: Text Similarity Detection of Chinese Medical Question Based on Mutual Information
Author(s) -
Minfeng Lu,
Qifei Zhang,
Huilai Zou
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3620313
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Chinese medical question similarity detection is an important task that uses natural language processing to assess whether two Chinese medical questions are similar semantically. It has broad applications in medical information retrieval and question-answering systems. However, due to the inherent complexity and ambiguity of the Chinese language, along with the specialized nature of the medical field, different questions may refer to the same disease using varying terminology. Furthermore, identical terms may carry different meanings across various contexts. To overcome this challenge, we propose a similarity detection method based on mutual information. The proposed approach employs mutual information techniques to extract richer textual information from the medical question corpus. We first obtain the text embedding vector through a Chinese pre-trained model and put it into a text feature encoder to get an encoded vector, then input both the embedding vector and the encoded vector into a similarity detection network and a mutual information maximization network. By simultaneously optimizing the objective functions of these two networks, we get a refined encoded vector to predict the similarity of questions. Experiments on the TCAI20 and cMedQQ datasets demonstrate that this method effectively detects medical question similarity, achieving significantly improved performance compared with many traditional methods. These results highlight the feasibility and effectiveness of the proposed approach.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom