A Generation Algorithm for “Text to Image” Based on Multi-Channel Attention | Zendy

Yang Yang | Zendy; Ainuddin Wahid Bin Abdul Wahab | Zendy; Norisma Binti Idris | Zendy; Dingguo Yu | Zendy; Chang Liu | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A Generation Algorithm for “Text to Image” Based on Multi-Channel Attention

Author(s) -

Yang Yang,

Ainuddin Wahid Bin Abdul Wahab,

Norisma Binti Idris,

Dingguo Yu,

Chang Liu

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3596894

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Research on text-to-image has gained significant attention. However, existing methods primarily rely on upsampling convolution operations for feature extraction during the initial image generation stage. This approach has inherent limitations, often leading to the loss of global information and the inability to capture long-range semantic dependencies. To address these issues, this study proposes a generation algorithm for “text to image” based on multi-channel attention (TTI-MCA). The method integrates a self-supervised module into the initial image generation phase, leveraging attention mechanisms to enable autonomous mapping learning between image features. This facilitates a deep integration of contextual understanding and self-attention learning. Additionally, a feature fusion enhancement module is introduced, which combines low-resolution features from the previous stage with high-resolution features from the current stage. This allows the generation network to fully utilize the rich semantic information of low-level features and the high-resolution details of high-level features, ultimately producing high-quality, realistic images. Experimental results show that TTI-MCA outperforms the baseline algorithm in both Inception Score (IS) and Fréchet Inception Distance (FID), achieving superior performance on the CUB and COCO datasets. This research provides a novel approach to generating high-quality images from text.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research