z-logo
open-access-imgOpen Access
A Coarse-to-Fine Object Tracking Based on Attention Mechanism
Author(s) -
Haixi Wen,
Xiaoming Chen,
Li Zhou
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/2010/1/012005
Subject(s) - computer science , artificial intelligence , minimum bounding box , bittorrent tracker , video tracking , computer vision , tracking (education) , segmentation , convolutional neural network , bounding overwatch , eye tracking , locality , pattern recognition (psychology) , object (grammar) , image (mathematics) , psychology , pedagogy , linguistics , philosophy
In the visual object tracking task, although the convolutional neural network has excellent performance, it is difficult to learn the global and long-range semantic information interaction due to the inherent locality of convolution. Simultaneously, the existing template updating methods focus on the target position of the next frame, which will integrate more irrelevant background and cause tracking instability. Therefore, to address this issue, inspired by Transformer, in this paper, we propose a coarse-to-fine tracking method based on the attention mechanism (CTFT). First, employing swin transformer block instead of the original convolutional network backbone to achieve self-attention from local to global. Second, using attention to effectively combine template and search region features in the part of feature fusion, and explore the possibility of using attention mechanism as the backbone in object tracking for the first time. Finally, a tracking strategy from coarse to fine is proposed. In the offline coarse tracking stage, the initial estimation of the target object is generated to obtain the coarse regress bounding box of the target. In the online fine tracking stage, we make use of the coarse regress bounding box to expand the corresponding object frame, and use the segmentation to get the fine regress bounding box of the object. Experiments on challenging benchmarks including GOT-10k, LaSOT, VOT2018, VOT2019, UAV123 and OTB-100 demonstrate that the proposed CTFT outperforms many state-of-the-art trackers and achieves leading performance.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here