z-logo
open-access-imgOpen Access
Adaptive Triplet Model for Fine-Grained Visual Categorization
Author(s) -
Jingyun Liang,
Jinlin Guo,
Yanming Guo,
Songyang Lao
Publication year - 2018
Publication title -
ieee access
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.587
H-Index - 127
ISSN - 2169-3536
DOI - 10.1109/access.2018.2884695
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Fine-grained visual categorization aims at differentiating subcategories, such as different species of birds, models of cars, and variants of aircraft. It often suffers from small inter-variance and large intra-variance. To keep dissimilar images far apart and preserve large intra-variance simultaneously, we propose an adaptive triplet model. At first, images are batched as triplets and input to a general convolutional network, which extracts convolutional image features. Then, we combine adaptive triplet loss and classification loss for multi-task training. Adaptive triplet loss pulls the same-class embeddings together and pushes examples from different subcategories apart. It allocates different weights to hard and easy examples in an adaptive way in the training process. Unlike previous hard mining mechanisms that discard all non-hard triplets, it can benefit from all possible informative examples. Moreover, a second-order distance function is put forward to capture local pairwise interactions of embeddings, which is more discriminative in distance measure. Classification loss is used to provide more direct supervision for training embeddings with category specific concepts. Furthermore, it makes the prediction of category more convenient and more efficient in testing. Experiments demonstrate the state-of-the-art results on three popular fine-grained datasets, including CUB-200-2011, Stanford Cars, and FGVC-Aircraft. In addition, our network structure is relatively simple compared with previous methods, which often suffer from multiple sub-networks and complex training mechanisms. It is also applicable for most up-to-date backbone networks, while others might be restricted to specific convolutional networks.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom