A Cross-Modal Image and Text Retrieval Method Based on Efficient Feature Extraction and Interactive Learning CAE | Zendy

Xiuye Yin | Zendy; Liyong Chen | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A Cross-Modal Image and Text Retrieval Method Based on Efficient Feature Extraction and Interactive Learning CAE

Author(s) -

Xiuye Yin,

Liyong Chen

Publication year - 2022

Publication title -

scientific programming

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.269

H-Index - 36

eISSN - 1875-919X

pISSN - 1058-9244

DOI - 10.1155/2022/7314599

Subject(s) - computer science , artificial intelligence , pattern recognition (psychology) , image retrieval , feature extraction , modal , autoencoder , deep learning , visual word , image (mathematics) , chemistry , polymer chemistry

In view of the complexity of the multimodal environment and the existing shallow network structure that cannot achieve high-precision image and text retrieval, a cross-modal image and text retrieval method combining efficient feature extraction and interactive learning convolutional autoencoder (CAE) is proposed. First, the residual network convolution kernel is improved by incorporating two-dimensional principal component analysis (2DPCA) to extract image features and extracting text features through long short-term memory (LSTM) and word vectors to efficiently extract graphic features. Then, based on interactive learning CAE, cross-modal retrieval of images and text is realized. Among them, the image and text features are respectively input to the two input terminals of the dual-modal CAE, and the image-text relationship model is obtained through the interactive learning of the middle layer to realize the image-text retrieval. Finally, based on Flickr30K, MSCOCO, and Pascal VOC 2007 datasets, the proposed method is experimentally demonstrated. The results show that the proposed method can complete accurate image retrieval and text retrieval. Moreover, the mean average precision (MAP) has reached more than 0.3, the area of precision-recall rate (PR) curves are better than other comparison methods, and they are applicable.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research