
Cross-modality Consistency Network for Remote Sensing Text-image Retrieval
Author(s) -
Yuchen Sha,
Yujian Feng,
Miao He,
Yichi Jin,
Shuai You,
Yimu Ji,
Fei Wu,
Shangdong Liu,
Shaoshuai Che
Publication year - 2025
Publication title -
ieee journal of selected topics in applied earth observations and remote sensing
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 1.246
H-Index - 88
eISSN - 2151-1535
pISSN - 1939-1404
DOI - 10.1109/jstars.2025.3586914
Subject(s) - geoscience , signal processing and analysis , power, energy and industry applications
Remote Sensing Cross-modality Text-Image Retrieval (RSCTIR) aims to retrieve a specific object from a large image gallery based on a natural language description, and vice versa. Existing methods mainly capture local and global context information within each modality for cross-modality matching. However, these methods are prone to interference from redundant information, such as background noises and irrelevant words, and neglect the capture of co-occurrence semantic relations between modalities ( i.e. , the probability of semantic information co-occurring with other information). To filter out intra-modality redundant information and capture inter-modality co-occurrent relations, we propose a Cross-modality Consistency Network (CCNet) including a Text-image Attention-conditioned Module (TAM) and a Co-occurrent Features Module (CFM). Firstly, TAM interacts with visual and textual feature representations by employing the cross-modality attention mechanism to focus on semantically similar fine-grained image features and then generate aggregated visual representations. Secondly, CFM is designed to estimate co-occurrence probability by measuring fine-grained feature similarity, thereby reinforcing the relations of target-consist features across modalities. In addition, we propose the Cross-modality Distinction (CD) loss function to learn semantic consistency between modalities by compacting intra-class samples and separating inter-class samples. Extensive benchmark experiments on three benchmarks demonstrate that our approach outperforms state-of-the-art methods.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom