SAR Strikes Back: A New Hope for RSVQA | Zendy

Lucrezia Tosato | Zendy; Sylvain Lobry | Zendy; Flora Weissgerber | Zendy; Laurent Wendling | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

SAR Strikes Back: A New Hope for RSVQA

Author(s) -

Lucrezia Tosato,

Sylvain Lobry,

Flora Weissgerber,

Laurent Wendling

Publication year - 2025

Publication title -

ieee journal of selected topics in applied earth observations and remote sensing

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 1.246

H-Index - 88

eISSN - 2151-1535

pISSN - 1939-1404

DOI - 10.1109/jstars.2025.3596678

Subject(s) - geoscience , signal processing and analysis , power, energy and industry applications

Remote Sensing Visual Question Answering (RSVQA) is a task that automatically extracts information from satellite images. It then processes a question to predict the answer from the images in textual form, helping with the interpretation of the image. While different methods have been proposed to extract information from optical images with different spectral bands and resolutions, only recently have some preliminary studies started exploring very high-resolution Synthetic Aperture Radar (SAR) data. These studies leverage SAR's ability to capture electromagnetic information and operate in all atmospheric conditions. However, no research has compared the results obtained using SAR and optical imagery or explored methods to fuse the two modalities effectively. This work investigates the integration of SAR images into the RSVQA task exploring the most effective way to combine them with optical images. In our research, we carry out a study on different pipelines for the task of RSVQA taking into account information from both SAR and optical data. To this purpose, we also present a dataset that allows for the introduction of SAR images in the RSVQA framework. We study two different pipelines for RSVQA to include SAR modality and introduce a dataset enabling SAR-based RSVQA. The first model is an End-to-End approach while the second is a two-stage framework. In the latter, relevant information is extracted from SAR, before being translated into natural language to be used in the second step which only relies on a language model to provide the answer. Our results show that the second pipeline achieves strong performance using SAR alone, yielding an improvement of nearly 10% in overall accuracy compared to the first one. We then explore various types of fusion methods to use SAR and optical images together. A fusion at the decision level achieves the best results on the proposed dataset, with a final F1-micro score of 75.00% and an F1-average of 81.21% for classification, as well as an overall accuracy of 75.49% for VQA. We show that SAR data offers additional information when fused with the optical modality, particularly for questions related to specific land cover classes, such as water areas.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research