Multimodal Structure-Consistent Image-to-Image Translation | Zendy

Che-Tsung Lin | Zendy; Yen-Yi Wu | Zendy; Po-Hao Hsu | Zendy; ShangHong Lai | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Multimodal Structure-Consistent Image-to-Image Translation

Author(s) -

Che-Tsung Lin,

Yen-Yi Wu,

Po-Hao Hsu,

ShangHong Lai

Publication year - 2020

Publication title -

proceedings of the aaai conference on artificial intelligence

Language(s) - English

Resource type - Journals

eISSN - 2374-3468

pISSN - 2159-5399

DOI - 10.1609/aaai.v34i07.6814

Subject(s) - artificial intelligence , computer science , image translation , boosting (machine learning) , translation (biology) , pattern recognition (psychology) , object detection , domain (mathematical analysis) , image (mathematics) , metric (unit) , computer vision , consistency (knowledge bases) , detector , object (grammar) , mathematics , mathematical analysis , telecommunications , biochemistry , chemistry , operations management , messenger rna , economics , gene

Unpaired image-to-image translation is proven quite effective in boosting a CNN-based object detector for a different domain by means of data augmentation that can well preserve the image-objects in the translated images. Recently, multimodal GAN (Generative Adversarial Network) models have been proposed and were expected to further boost the detector accuracy by generating a diverse collection of images in the target domain, given only a single/labelled image in the source domain. However, images generated by multimodal GANs would achieve even worse detection accuracy than the ones by a unimodal GAN with better object preservation. In this work, we introduce cycle-structure consistency for generating diverse and structure-preserved translated images across complex domains, such as between day and night, for object detector training. Qualitative results show that our model, Multimodal AugGAN, can generate diverse and realistic images for the target domain. For quantitative comparisons, we evaluate other competing methods and ours by using the generated images to train YOLO, Faster R-CNN and FCN models and prove that our model achieves significant improvement and outperforms other methods on the detection accuracies and the FCN scores. Also, we demonstrate that our model could provide more diverse object appearances in the target domain through comparison on the perceptual distance metric.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research