
Global-Affine and Local-Specific Generative Adversarial Network for semantic-guided image generation
Author(s) -
Susu Zhang,
Jian Ni,
Lei Hou,
Zili Zhou,
Jie Hou,
Feng Gao
Publication year - 2021
Publication title -
mathematical foundations of computing
Language(s) - English
Resource type - Journals
ISSN - 2577-8838
DOI - 10.3934/mfc.2021009
Subject(s) - computer science , affine transformation , artificial intelligence , generative grammar , rendering (computer graphics) , generator (circuit theory) , feature (linguistics) , fidelity , image (mathematics) , graph , pattern recognition (psychology) , theoretical computer science , mathematics , telecommunications , power (physics) , linguistics , physics , philosophy , quantum mechanics , pure mathematics
The recent progress in learning image feature representations has opened the way for tasks such as label-to-image or text-to-image synthesis. However, one particular challenge widely observed in existing methods is the difficulty of synthesizing fine-grained textures and small-scale instances. In this paper, we propose a novel Global-Affine and Local-Specific Generative Adversarial Network (GALS-GAN) to explicitly construct global semantic layouts and learn distinct instance-level features. To achieve this, we adopt the graph convolutional network to calculate the instance locations and spatial relationships from scene graphs, which allows our model to obtain the high-fidelity semantic layouts. Also, a local-specific generator, where we introduce the feature filtering mechanism to separately learn semantic maps for different categories, is utilized to disentangle and generate specific visual features. Moreover, we especially apply a weight map predictor to better combine the global and local pathways considering the highly complementary between these two generation sub-networks. Extensive experiments on the COCO-Stuff and Visual Genome datasets demonstrate the superior generation performance of our model against previous methods, our approach is more capable of capturing photo-realistic local characteristics and rendering small-sized entities with more details.