z-logo
open-access-imgOpen Access
A Neighbourhood Encoding Framework for Deep Mining Heterogeneous Texts in Recipe-image Retrieval
Author(s) -
Changsheng Zhu,
Nan Ji,
Jianhua Yu,
Dazhi Jiang,
Lin Zheng
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1813/1/012029
Subject(s) - computer science , modality (human–computer interaction) , modalities , linear subspace , encoding (memory) , artificial intelligence , information retrieval , subspace topology , image (mathematics) , natural language processing , data mining , mathematics , social science , geometry , sociology
Cross-modal retrieval usually fills the semantic gap between different modalities by sharing subspaces. However, existing methods rarely consider that the data in a certain modality may be heterogeneous when mapping multimodal data into a shared subspace. In addition, most existing methods focus on semantic associations between different modalities, while few approaches consider the semantic associations within a single modality. To address the above two deficiencies, we propose a Neighbourhood Encoding (NE) framework that mines the semantic association of data in the same modality, solves the problem of data heterogeneity by improving the semantic expression of a single modality. To verify the effectiveness of the proposed framework, we use two types of recurrent neural networks to instantiate the framework. Experiments show that the instantiated approaches outperform existing advanced methods in both text-to-image and image-to-text retrieval directions.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here