MSCA-Net: Multi-Scale Chunked Attention Network for High-Resolution Satellite Stereo Matching
Author(s) -
Yixiao Wang,
Zhenquan Wen,
Xu Huang
Publication year - 2025
Publication title -
ieee journal of selected topics in applied earth observations and remote sensing
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 1.246
H-Index - 88
eISSN - 2151-1535
pISSN - 1939-1404
DOI - 10.1109/jstars.2025.3622164
Subject(s) - geoscience , signal processing and analysis , power, energy and industry applications
Dense Image Matching (DIM) of high-resolution satellite stereo pairs has shown wide applications in large-scale 3D reconstruction. In recent years, deep learning techniques have been widely applied in stereo matching due to their remarkable ability to extract deep features from satellite imagery. However, these methods still face challenges in scenarios such as occlusions, disparity discontinuities, texture-less regions, and repetitive patterns. To overcome these challenges, this paper proposes the Multi-Scale Chunked Attention Network (MSCA Net), an end-to-end stereo framework that improves disparity estimation accuracy by optimizing cost aggregation through a chunked processing strategy. First, multi-scale cost volumes are constructed using pyramidal feature extraction. Second, a Hier archical Spatial Aggregation (HSA) module dynamically captures multi-dimensional information in the 4D cost volume via a joint attention mechanism, enhancing initial feature aggregation. Third, a Chunked Attention Hourglass (CAH) module applies a chunk-wise attention strategy to precisely adjust features across regions, minimizing interference from redundant information. Finally, an edge-enhanced disparity refinement module improves the edge regions of the disparity map, reducing semantic distortion. Comprehensive comparisons on publicly available satellite stereo datasets demonstrate that the proposed network significantly outperforms in challenging regions, achieving end point errors (EPE) of 1.347 and 1.424 pixels on the Urban Semantic 3D and WHU-Stereo datasets, respectively—reductions of approximately 4.9% and 13.1% over current state-of-the-art methods—demonstrating superior overall performance.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom