
Visual tracking based on semantic and similarity learning
Author(s) -
Zha Yufei,
Wu Min,
Qiu Zhuling,
Yu Wangsheng
Publication year - 2019
Publication title -
iet computer vision
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.38
H-Index - 37
eISSN - 1751-9640
pISSN - 1751-9632
DOI - 10.1049/iet-cvi.2018.5826
Subject(s) - artificial intelligence , discriminative model , computer science , similarity (geometry) , pattern recognition (psychology) , feature (linguistics) , semantic similarity , bittorrent tracker , class (philosophy) , feature extraction , semantic feature , eye tracking , binary classification , clutter , similarity learning , machine learning , support vector machine , image (mathematics) , radar , telecommunications , philosophy , linguistics
We present a method by combining the similarity and semantic features of a target to improve tracking performance in video sequences. Trackers based on Siamese networks have achieved success in recent competitions and databases through learning similarity according to binary labels. Unfortunately, such weak labels result in limiting the discriminative ability of the learned feature, thus it is difficult to identify the target itself from the distractors that have the same class. The authors observe that the inter‐class semantic features benefit to increase the separation between the target and the background, even distractors. Therefore, they proposed a network architecture which uses both similarity and semantic branches to obtain more discriminative features for locating the target accuracy in new frames. The large‐scale ImageNet VID dataset is employed to train the network. Even in the presence of background clutter, visual distortion, and distractors, the proposed method still maintains following the target. They test their method with the open benchmarks OTB and UAV123. The results show that their combined approach significantly improves the tracking ability relative to trackers using similarity or semantic features alone.