Transformer with sparse self‐attention mechanism for image captioning | Zendy

Wang Duofeng | Zendy; Hu Haifeng | Zendy; Chen Dihu | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Transformer with sparse self‐attention mechanism for image captioning

Author(s) -

Wang Duofeng,

Hu Haifeng,

Chen Dihu

Publication year - 2020

Publication title -

electronics letters

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.375

H-Index - 146

ISSN - 1350-911X

DOI - 10.1049/el.2020.0635

Subject(s) - closed captioning , transformer , computer science , image (mathematics) , artificial intelligence , computer vision , electronic engineering , electrical engineering , engineering , voltage

Recently, transformer has been applied to the image caption model, in which the convolutional neural network and the transformer encoder act as the image encoder of the model, and the transformer decoder acts as the decoder of the model. However, transformer may suffer from the interference of non‐critical objects of a scene and meet with difficulty to fully capture image information due to its self‐attention mechanism's dense characteristics. In this Letter, in order to address this issue, the authors propose a novel transformer model with decreasing attention gates and attention fusion module. Specifically, they firstly use attention gate to force transformer to overcome the interference of non‐critical objects and capture objects information more efficiently via truncating all the attention weights that smaller than gate threshold. Secondly, through inheriting attentional matrix from the previous layer of each network layer, the attention fusion module enables each network layer to consider other objects without losing the most critical ones. Their method is evaluated using the benchmark Microsoft COCO dataset and achieves better performance compared to the state‐of‐the‐art methods.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research