Decoupled Latent Diffusion Model for Enhancing Image Generation | Zendy

Hyun-Tae Choi | Zendy; Kensuke Nakamura | Zendy; Byung-Woo Hong | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Decoupled Latent Diffusion Model for Enhancing Image Generation

Author(s) -

Hyun-Tae Choi,

Kensuke Nakamura,

Byung-Woo Hong

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3592163

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Latent Diffusion Models have emerged as an efficient alternative to conventional diffusion approaches by compressing high-dimensional images into a lower-dimensional latent space using a Variational Autoencoder (VAE) and performing diffusion in that space. In standard Latent Diffusion Model (LDM), the latent code is formed by sampling from a Gaussian distribution (i.e., combining both the mean and the standard deviation), which helps regularize the latent space but appears to contribute little beyond the deterministic component. Motivated by recent empirical observations that the decoder relies primarily on the latent mean, our work reexamines this paradigm and proposes a decoupled latent diffusion model that focuses on a simplified latent representation. Specifically, we compare three configurations: (i) the standard latent code, (ii) a concatenated representation that explicitly preserves both mean and variance, and (iii) a deterministic mean-only representation. Our extensive experiments on multiple benchmark datasets demonstrate that, when compared to the standard approach, the mean-only configuration not only maintains but in many cases improves synthesis quality by producing sharper and more coherent images while reducing unnecessary noise. These findings suggest that a simplified, deterministic latent representation can yield more stable and efficient generative models, challenging the conventional reliance on latent sampling in diffusion-based image synthesis.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research