A Light-weighted Fusion Vision Mamba for Multimodal Remote Sensing Data Classification | Zendy

Xin He | Zendy; Xiao Han | Zendy; Yushi Chen | Zendy; Lingbo Huang | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A Light-weighted Fusion Vision Mamba for Multimodal Remote Sensing Data Classification

Author(s) -

Xin He,

Xiao Han,

Yushi Chen,

Lingbo Huang

Publication year - 2025

Publication title -

ieee journal of selected topics in applied earth observations and remote sensing

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 1.246

H-Index - 88

eISSN - 2151-1535

pISSN - 1939-1404

DOI - 10.1109/jstars.2025.3598755

Subject(s) - geoscience , signal processing and analysis , power, energy and industry applications

Recent studies have shown that Vision Mamba (VMamba) excels in long-sequence modeling capabilities, offering efficient visual representation learning. However, existing VMamba-based methods primarily focus on single modality and are not readily adaptable for multimodal data processing. In this study, we aim to leverage the power of VMamba by investigating a light-weighted fusion VMamba for multimodal remote sensing data classification. Firstly, to integrate information from various modalities, we propose a spatial and channel fusion VMamba for multimodal remote sensing classification. For spatial fusion, a two-branch state space model is constructed based on VMamba, where the parameters of each branch interact to merge the spatial information from different modalities. Regarding channel fusion, a channel fusion VMamba is introduced for multimodal remote sensing data classification, which employs a specific eigenvalue computation in the frequency domain for more effective feature fusion based on the fast Fourier transformation. Secondly, to minimize the computational cost of the fusion VMamba in multimodal remote sensing data classification, we explore a light-weighted fusion VMamba. Specifically, information from different modalities is reconstructed by adopting a skip sampling scanning scheme within VMamba, which replaces the standard scanning scheme and reduces the number of parameters in VMamba. Extensive experiments on three public multimodal remote sensing datasets have demonstrated that our proposed light-weighted fusion VMamba surpasses state-of-the-art methods in terms of classification accuracy and computational cost. For instance, the proposed light-weighted fusion VMamba achieves a 20% reduction in FLOPs compared to the standard VMamba on the Houston dataset for multimodal remote sensing classification.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research