ChildDiffusion: Unlocking the Potential of Generative AI and Controllable Augmentations for Child Facial Data using Stable Diffusion and Large Language Models | Zendy

Muhammad Ali Farooq | Zendy; Wang Yao | Zendy; Peter Corcoran | Zendy

Open Access

ChildDiffusion: Unlocking the Potential of Generative AI and Controllable Augmentations for Child Facial Data using Stable Diffusion and Large Language Models

Author(s) -

Muhammad Ali Farooq,

Wang Yao,

Peter Corcoran

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3575964

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Ensuring the availability of child facial datasets is essential for advancing AI applications, yet legal, ethical, and data scarcity concerns pose significant challenges. Current generative models such as StyleGAN excel at producing synthetic facial data but struggle with temporal consistency, control over output attributes, and diversity in rendered features. These limitations underscore the need for a more robust and adaptable framework. In this research, we propose the ChildDiffusion framework, designed to generate photorealistic child facial data using diffusion models. The framework integrates intelligent augmentations via short text prompts, employs various image samplers, and leverages ControlNet for enhanced model conditioning. Additionally, we have used large language models (LLMs) to provide complex textual guidance to enable precise image-to-image transformations, facilitating the curation of diverse, high-quality datasets. The model was validated by generating child faces with varied ethnicities, facial expressions, poses, lighting conditions, eye-blinking effects, accessories, hair colors, and multi-subject compositions. To exemplify its potential, we open-sourced a dataset of 2.5k child facial samples across five ethnic classes, which underwent rigorous qualitative and quantitative evaluations. Further, we fine-tuned a Vision Transformer model to classify child ethnicity as a downstream task, demonstrating the framework’s utility. This research advances generative AI by addressing data scarcity and ethical challenges, showcasing how diffusion models can produce realistic child facial data while ensuring compliance with privacy standards. The versatile ChildDiffusion framework offers broad potential for machine learning applications, serving as a valuable tool for AI innovation. The project website, along with the complete ChildRace dataset and the fine-tuned model, is available at (https://mali-farooq.github.io/childdiffusion/).

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research