MCATD: Multi-scale Contextual Attention Transformer Diffusion for Unsupervised Low-light Image Enhancement | Zendy

Cheng Da | Zendy; Yongsheng Qian | Zendy; Junwei Zeng | Zendy; Xuting Wei | Zendy; Futao Zhang | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

MCATD: Multi-scale Contextual Attention Transformer Diffusion for Unsupervised Low-light Image Enhancement

Author(s) -

Cheng Da,

Yongsheng Qian,

Junwei Zeng,

Xuting Wei,

Futao Zhang

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3573171

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Low-light image enhancement (LLIE) remains a challenging task due to the complex degradation patterns in images captured under insufficient illumination, including non-linear intensity mappings, spatially-varying noise distributions, and content-dependent color distortions. Despite significant advances, existing methods struggle with three fundamental challenges: (1) difficulty in simultaneously preserving structural details while reducing noise, (2) limited generalization across diverse lighting conditions and scene types, and (3) computational inefficiency when processing complex natural scenes. While recent diffusion-based methods have shown promise, they often struggle with generalization and require paired training data. We propose MCATD, a novel unsupervised framework that integrates adaptive sampling, multi-scale feature extraction, and dynamic enhancement capabilities into diffusion models for LLIE. The framework consists of three key components: (1) a Dynamic Adaptive Diffusion Sampling (DADS) strategy that adjusts sampling steps based on image complexity, (2) a Multi-scale Contextual Attention Transformer (MCAT) network that captures features at different scales with attention mechanisms, and (3) a Multi-scale Dynamic Structure-Preserving (MDSP) loss that preserves image structure while optimizing perceptual quality. Experimental results on multiple benchmarks demonstrate that our method outperforms state-of-the-art unsupervised approaches and achieves comparable performance to supervised methods while maintaining better generalization ability. Furthermore, ablation studies validate the effectiveness of each proposed component. The proposed framework not only advances the field of unsupervised LLIE but also provides insights into leveraging diffusion models for image restoration tasks.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore