
Frequency Spectrum Adaptor for Remote Sensing Image-Text Retrieval
Author(s) -
Ziyi Wan,
Enyuan Zhao,
Jie Nie,
Ze Zhang,
Zhiqiang Wei,
Nan Zheng,
Yuting Zhao
Publication year - 2025
Publication title -
ieee journal of selected topics in applied earth observations and remote sensing
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 1.246
H-Index - 88
eISSN - 2151-1535
pISSN - 1939-1404
DOI - 10.1109/jstars.2025.3589786
Subject(s) - geoscience , signal processing and analysis , power, energy and industry applications
Remote Sensing Image-Text Retrieval (RSITR) is a critical task that involves parsing the content of remote sensing (RS) images to match semantically relevant text. Existing RSITR methods primarily focus on directly adopting pre-trained models and performing transfer learning through fine-tuning, neglecting the complex high-dimensional structured information present in remote sensing images, where texture, color, scale, and semantics are tightly coupled. Consequently, these methods exhibit limitations in handling specific receptive fields or preserving the structural information within RS images, leading to inaccurate retrieval matches. To mitigate this issue, this paper proposes a frequency spectrum adapter for RSITR, which aims to perceive the unique structured information of remote sensing images to facilitate the transfer of visual-linguistic knowledge from the natural domain to the RS domain. The main contributions of this paper are as follows: 1) A frequency-domain-based RS image-text retrieval adapter (FRS-Adapter) was developed. By expanding the spectral receptive field, it extracts the unique structured information of remote sensing images, enhancing the fine-tuning effect of RS to natural scene domain transfer. 2) A unimodal filter bank was designed, which uses filter banks for unimodal spectral compression across multiple bands. Within each band, the spectral features of amplitude and phase structures are utilized to further enhance the representation of structured information. 3) A cross-modal spectrum mutual aggregation module was introduced to promote the deep integration of linguistic, spatial, and spectral information. This guides the retention of relevant frequency components and effectively reduces the impact of irrelevant frequency components. 4) We conduct quantitative and qualitative experiments on three large remote sensing cross-modal retrieval datasets, validating the significant performance of the FRS-Adapter in RSITR.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom