z-logo
open-access-imgOpen Access
CTNet:Multimodal Remote Sensing Image Key Point Detection and Description for CNN and Transformer Architectures
Author(s) -
Chenke Yue,
Yin Zhang,
Junhua Yan,
Yong Liu,
Pengyu Guo
Publication year - 2025
Publication title -
ieee journal of selected topics in applied earth observations and remote sensing
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 1.246
H-Index - 88
eISSN - 2151-1535
pISSN - 1939-1404
DOI - 10.1109/jstars.2025.3595440
Subject(s) - geoscience , signal processing and analysis , power, energy and industry applications
Keypoint detection and description from multisensor or multimodal images are fundamental to image registration and its downstream tasks. However, the nonlinear radiometric differences, illumination variations, and geometric distortions between multimodal remote sensing images pose significant challenges. To address these issues, this paper proposes a weakly supervised multimodal keypoint detection and description network (CTNet), which extracts robust and repeatable feature descriptors at a low cost without requiring densely labeled annotations or extensive pretraining. In terms of network design, CTNet effectively combines convolutional neural network (CNN) and Transformer architectures by introducing a multimodal global and local information interaction (MGLI) module. Additionally, a lightweight keypoint detector is designed to efficiently detect keypoints by evaluating pixel saliency within neighborhoods and incorporating their depth maxima. For model optimization, a novel loss function, multiple pair weighted loss, is introduced. This loss function samples and weights positive and negative pairs of multimodal features, effectively capturing the similarity relationships among samples to learn a robust feature embedding space. Finally, CTNet is evaluated on both public and self-collected multimodal VIS-SAR and VIS-IR image datasets and compared with state-of-the-art keypoint detection and description models. Experimental results demonstrate that CTNet achieves superior matching accuracy and robustness in multimodal image matching tasks, outperforming existing methods.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom