z-logo
open-access-imgOpen Access
Dual-Stream Spatially-Aware Transformer for Remote Sensing Image Captioning
Author(s) -
Haifeng Sima,
Xiangtao Ding,
JianLong Wang,
Mingliang Xu
Publication year - 2025
Publication title -
ieee journal of selected topics in applied earth observations and remote sensing
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 1.246
H-Index - 88
eISSN - 2151-1535
pISSN - 1939-1404
DOI - 10.1109/jstars.2025.3593887
Subject(s) - geoscience , signal processing and analysis , power, energy and industry applications
Remote sensing image captioning(RSIC) aims to generate semantically rich and syntactically accurate descriptions for remote sensing images. However, due to the complex spatial layouts, occlusions, and overlapping objects in such images, caption generation is often challenged by semantic ambiguity. To address these issues, we propose a novel Dual-Stream Spatially-Aware Transformer (DSAT) ,which explicitly models both global and local spatial relationships to enhance spatial understanding. Specifically, DSAT introduces a Dual-Stream Feature Interaction (DFI) module that extracts grid-level global features and region-level object features, and further enhances their respective spatial dependencies through multi-branch convolution and a graph attention network. Additionally, we design a Spatially-Aware Attention (SAA) mechanism that encodes relative spatial relationships into the Transformer, allowing the model to better capture object distribution patterns and geometric relationships. Extensive experiments conducted on three benchmark datasets, namely Sydney-Captions, UCM-Captions, and RSICD, highlight the superior performance of DSAT. The proposed method achieves impressive CIDEr scores of 338.59%, 450.93%, and 275.36% on these datasets, respectively, demonstrating its effectiveness in generating high-quality captions for remote sensing images.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom