Premium
Deep supervised multimodal semantic autoencoder for cross‐modal retrieval
Author(s) -
Tian Yu,
Yang Wenjing,
Liu Qingsong,
Yang Qiong
Publication year - 2020
Publication title -
computer animation and virtual worlds
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.225
H-Index - 49
eISSN - 1546-427X
pISSN - 1546-4261
DOI - 10.1002/cav.1962
Subject(s) - computer science , autoencoder , benchmark (surveying) , feature (linguistics) , artificial intelligence , modal , semantic feature , feature vector , information retrieval , semantic gap , deep learning , pattern recognition (psychology) , image retrieval , linguistics , philosophy , geodesy , image (mathematics) , geography , chemistry , polymer chemistry
Cross‐modal retrieval aims to do flexible retrieval among different modals, whose main issue is how to measure the semantic similarities among multimodal data. Though many existing methods have been proposed to enable cross‐modal retrieval, they rarely consider the preservation of content information among multimodal data. In this paper, we present a three‐stage cross‐modal retrieval method, named MMCA‐CMR . To reduce the discrepancy among multimodal data, we first attempt to embed multimodal data into a common representation space. We then combine the feature vectors with the content information into the semantic‐aware feature vectors. We finally obtain the feature‐aware and content‐aware projections via multimodal semantic autoencoders. With semantic deep autoencoders, MMCA‐CMR promotes a more reliable cross‐modal retrieval by learning feature vectors from different modalities and content information simultaneously. Extensive experiments demonstrate that the proposed method is valid in cross‐modal retrieval, which significantly outperforms state‐of‐the‐art on four widely‐used benchmark datasets.