HiTrans-SAM: Hierarchical Transformer Encoder and SAM-Augmented Inputs for Multi-Scale Remote Sensing Image Segmentation
Author(s) -
Yulian Li,
Jiyang Gao,
Yikang Du,
Yuxuan Xiao,
Zhengjie Gao,
Haitao Huang
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3617388
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Semantic segmentation of remote sensing images is challenging due to complex scenes, substantial variations in object scales, and ambiguous boundaries. In this study, we propose a novel method, HiTrans-SAM: Hierarchical Transformer Encoder and SAM-Augmented Inputs for Multi-Scale Remote Sensing Image Segmentation. The framework adopts an encoder-decoder architecture. First, prior to encoding, the input image is enhanced using SAM to incorporate boundary prior maps generated by SAM, thereby mitigating boundary ambiguity. Subsequently, a Hierarchical Transformer Encoder is integrated into the encoding network to facilitate information propagation. This module captures high-resolution spatial details while effectively leveraging global contextual relationships. During the decoding phase, multi-scale feature fusion is performed to ensure comprehensive utilization of features across varying scales, ultimately improving segmentation accuracy. Experiments on the LoveDA and Potsdam datasets demonstrate state-of-the-art performance, achieving mean Intersection over Union (mIoU) values of 53.52% (LoveDA), 79.45% (Potsdam) and 75.12%(Vaihingen), significantly outperforming existing methods. The results validate the algorithm’s efficacy in enhancing segmentation accuracy through boundary refinement, context modeling, and multi-scale feature fusion.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom