KingdomGlimpses: Evaluating Saudi Cultural Representation through Text-to-Image Models
Author(s) -
Nada Almarwani,
Samah Aloufi,
Sakhar Alkhereyf,
Manal Alhassoun,
Manal Almutery,
Nouf Alshalawi,
Abdulmohsen Al-Thubaity
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3619432
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Recent advances in text-to-image generation models, such as DALL·E 3 and Gemini, have enabled the mass production of artworks based on simple text prompts. However, capturing the essence of cultural depth, richness, and nuances is one of the main challenges that significantly limits the democratization and inclusivity of generative models. In this work, we investigate and compare two text-to-image generative models, DALL·E 3 and Gemini, based on their ability to understand, capture, and represent various aspects of the Saudi cultural legacy, including regional cuisine, traditional fashion, and prominent sites. We utilized a variety of hand-crafted and LLM-generated prompts (using GPT-4) to guide the text-to-image models in generating images of Saudi cultural artifacts, and subsequently evaluated the outputs. We rely on human evaluation guided by a three-fold framework to assess key dimensions of the generative model performance: cultural awareness in the LLM-generated prompts, cultural sensitivity and authenticity in the text-to-image output, and visual and semantic fidelity of the generated images. Our findings indicate that, despite generating high-quality images, both models often struggle to faithfully represent specific cultural artifacts. Hence, further research is needed to improve cultural representation in text-to-image models and to enhance their understanding and perception of diverse cultures.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom