Cross-view Geo-Localization for Autonomous UAV using Locally-Aware Transformer-based Network | Zendy

Duc Viet Bui | Zendy; Masao Kubo | Zendy; Hiroshi Sato | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Cross-view Geo-Localization for Autonomous UAV using Locally-Aware Transformer-based Network

Author(s) -

Duc Viet Bui,

Masao Kubo,

Hiroshi Sato

Publication year - 2023

Publication title -

ieee access

Language(s) - English

Resource type - Journals

ISSN - 2169-3536

DOI - 10.1109/access.2023.3317950

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Although GPS is commonly used for the autonomous flying of unmanned aerial vehicles (UAVs), when it comes to GPS-denied environments, researchers mainly focus on image-based localization methods due to their tremendous advantages. In this study, we study the problem of image-based geo-localization between UAV and satellite (known as cross-view geo-localization), which is an essential step towards image-based localization. In cross-view geo-localization, extracting fine-grained features containing contextual information from images is challenging due to the large gap in visual representations between different views. Existing methods in this field often use convolutional neural networks (CNNs) as feature extractors. However, CNNs have some limitations in receptive fields, which leads to the loss of fine-grained information. Some researchers have implemented Transformer-based networks to overcome these circumstances. However, these approaches only focused on understanding the meaning of each pixel based on their attention and only partially utilized tokens that are produced from Transformer blocks. Therefore, different from these works, we proposed a Vision Transformer-based network that takes advantage of local tokens, especially the classification token. Through experiments, our proposed model has significantly outperformed existing state-of-the-art models, which gave promising capabilities for developing this method in the future.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore