
SAR Strikes Back: A New Hope for RSVQA
Author(s) -
Lucrezia Tosato,
Sylvain Lobry,
Flora Weissgerber,
Laurent Wendling
Publication year - 2025
Publication title -
ieee journal of selected topics in applied earth observations and remote sensing
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 1.246
H-Index - 88
eISSN - 2151-1535
pISSN - 1939-1404
DOI - 10.1109/jstars.2025.3596678
Subject(s) - geoscience , signal processing and analysis , power, energy and industry applications
Remote Sensing Visual Question Answering (RSVQA) is a task that automatically extracts information from satellite images. It then processes a question to predict the answer from the images in textual form, helping with the interpretation of the image. While different methods have been proposed to extract information from optical images with different spectral bands and resolutions, only recently have some preliminary studies started exploring very high-resolution Synthetic Aperture Radar (SAR) data. These studies leverage SAR's ability to capture electromagnetic information and operate in all atmospheric conditions. However, no research has compared the results obtained using SAR and optical imagery or explored methods to fuse the two modalities effectively. This work investigates the integration of SAR images into the RSVQA task exploring the most effective way to combine them with optical images. In our research, we carry out a study on different pipelines for the task of RSVQA taking into account information from both SAR and optical data. To this purpose, we also present a dataset that allows for the introduction of SAR images in the RSVQA framework. We study two different pipelines for RSVQA to include SAR modality and introduce a dataset enabling SAR-based RSVQA. The first model is an End-to-End approach while the second is a two-stage framework. In the latter, relevant information is extracted from SAR, before being translated into natural language to be used in the second step which only relies on a language model to provide the answer. Our results show that the second pipeline achieves strong performance using SAR alone, yielding an improvement of nearly 10% in overall accuracy compared to the first one. We then explore various types of fusion methods to use SAR and optical images together. A fusion at the decision level achieves the best results on the proposed dataset, with a final F1-micro score of 75.00% and an F1-average of 81.21% for classification, as well as an overall accuracy of 75.49% for VQA. We show that SAR data offers additional information when fused with the optical modality, particularly for questions related to specific land cover classes, such as water areas.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom