
LG-Umer: UNet-like Network Integrate Local-Global Feature with Novel Attention for Road Extraction from Remote Sensing Images
Author(s) -
Penghui Niu,
Taotao Cai,
Yajuan Zhang,
Ping Zhang,
Wenjia Xu,
Junhua Gu,
Jungong Han
Publication year - 2025
Publication title -
ieee journal of selected topics in applied earth observations and remote sensing
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 1.246
H-Index - 88
eISSN - 2151-1535
pISSN - 1939-1404
DOI - 10.1109/jstars.2025.3573735
Subject(s) - geoscience , signal processing and analysis , power, energy and industry applications
Road extraction from remote sensing images (RSIs) is a key research area in smart city development. While deep learning techniques have demonstrated remarkable effectiveness in this domain, existing approaches exhibit limitations: convolutional neural network (CNN)-based methods struggle to capture global contextual information for long-range road networks, vision transformer (ViT)-based methods fail to adequately extract multi-scale local features, and hybrid CNN-ViT architectures overlook the synergistic guidance between local and global features. To address these challenges, we propose LG-Umer, a UNet-like network that integrates Local-Global features with a novel attention mechanism, combining the complementary strengths of CNNs and ViTs within an encoder-decoder framework. Specifically, the encoder employs a Multi-scale Strip Deformational (MSD) module, which utilizes deformable convolutions to adaptively extract topological structures and variable-shaped local road features. In the decoder, a Multi-stage Gate Unit (MGU) module is introduced, incorporating a novel attention mechanism to model long-range dependencies by leveraging local features as attention operators for global feature refinement. Extensive experiments on three public benchmarks demonstrate the superiority of LG-Umer. It achieves IoU scores of 70.4%, 71.2% and 68.7% on the Massachusetts Road, DeepGlobe Road, and CHN6-CUG datasets, respectively, surpassing recent state-ofthe-art (SOTA) methods by 1.2%, 0.9%, and 1.1%. These results validate the effectiveness of our approach in balancing local detail preservation and global contextual modeling for road extraction tasks.