
Cross-view Geo-Localization for Autonomous UAV using Locally-Aware Transformer-based Network
Author(s) -
Duc Viet Bui,
Masao Kubo,
Hiroshi Sato
Publication year - 2023
Publication title -
ieee access
Language(s) - English
Resource type - Journals
ISSN - 2169-3536
DOI - 10.1109/access.2023.3317950
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Although GPS is commonly used for the autonomous flying of unmanned aerial vehicles (UAVs), when it comes to GPS-denied environments, researchers mainly focus on image-based localization methods due to their tremendous advantages. In this study, we study the problem of image-based geo-localization between UAV and satellite (known as cross-view geo-localization), which is an essential step towards image-based localization. In cross-view geo-localization, extracting fine-grained features containing contextual information from images is challenging due to the large gap in visual representations between different views. Existing methods in this field often use convolutional neural networks (CNNs) as feature extractors. However, CNNs have some limitations in receptive fields, which leads to the loss of fine-grained information. Some researchers have implemented Transformer-based networks to overcome these circumstances. However, these approaches only focused on understanding the meaning of each pixel based on their attention and only partially utilized tokens that are produced from Transformer blocks. Therefore, different from these works, we proposed a Vision Transformer-based network that takes advantage of local tokens, especially the classification token. Through experiments, our proposed model has significantly outperformed existing state-of-the-art models, which gave promising capabilities for developing this method in the future.