z-logo
open-access-imgOpen Access
Multi-stage Image Aesthetic Assessment via Chain-of-Thought Reasoning
Author(s) -
Zhenglang Jiang,
Jianhao Liu,
Hongming Li,
Yue Liu,
Yu Song
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3590925
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Image Aesthetic Assessment (IAA) is an crucial task in computer vision, aiming to quantify the aesthetic quality of images. Existing methods face two main challenges: neglecting the sequential modeling of human visual perception, and the fact that multi-attribute annotation is extremely time-consuming and labor-intensive. This letter proposes a novel Multi-stage IAA framework, leveraging Chain-of-Thought (CoT) reasoning and Multimodal Large Language Models (MLLMs). The framework designs a system that mimics the chain-like progression of human cognitive processing, with three dedicated modules: Low-level Stimulus Assessment (e.g., color harmony), Holistic Organizing Assessment (e.g., scene semantics), and High-level Perceiving Assessment (e.g., emotional resonance). These modules progressively analyze images, from basic visual features to high-level emotional understanding, by employing a step-by-step reasoning process characteristic of CoT. Furthermore, we explore an MLLM-oriented data transformation paradigm to convert multi-source IAA datasets into structured, Chain-of-Thought-compatible data, facilitating convenient attribute annotation. This enables the MLLMs to effectively learn and apply the CoT reasoning for aesthetic evaluation. Experiment on the PARA dataset achieves state-of-the-art results. Moreover, the framework exhibits generalization capabilities on the unseen FLICKR-AES dataset.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom