z-logo
open-access-imgOpen Access
ChildDiffusion: Unlocking the Potential of Generative AI and Controllable Augmentations for Child Facial Data using Stable Diffusion and Large Language Models
Author(s) -
Muhammad Ali Farooq,
Wang Yao,
Peter Corcoran
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3575964
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Ensuring the availability of child facial datasets is essential for advancing AI applications, yet legal, ethical, and data scarcity concerns pose significant challenges. Current generative models such as StyleGAN excel at producing synthetic facial data but struggle with temporal consistency, control over output attributes, and diversity in rendered features. These limitations underscore the need for a more robust and adaptable framework. In this research, we propose the ChildDiffusion framework, designed to generate photorealistic child facial data using diffusion models. The framework integrates intelligent augmentations via short text prompts, employs various image samplers, and leverages ControlNet for enhanced model conditioning. Additionally, we have used large language models (LLMs) to provide complex textual guidance to enable precise image-to-image transformations, facilitating the curation of diverse, high-quality datasets. The model was validated by generating child faces with varied ethnicities, facial expressions, poses, lighting conditions, eye-blinking effects, accessories, hair colors, and multi-subject compositions. To exemplify its potential, we open-sourced a dataset of 2.5k child facial samples across five ethnic classes, which underwent rigorous qualitative and quantitative evaluations. Further, we fine-tuned a Vision Transformer model to classify child ethnicity as a downstream task, demonstrating the framework’s utility. This research advances generative AI by addressing data scarcity and ethical challenges, showcasing how diffusion models can produce realistic child facial data while ensuring compliance with privacy standards. The versatile ChildDiffusion framework offers broad potential for machine learning applications, serving as a valuable tool for AI innovation. The project website, along with the complete ChildRace dataset and the fine-tuned model, is available at (https://mali-farooq.github.io/childdiffusion/).

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom