
Multi-stage Image Aesthetic Assessment via Chain-of-Thought Reasoning
Author(s) -
Zhenglang Jiang,
Jianhao Liu,
Hongming Li,
Yue Liu,
Yu Song
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3590925
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Image Aesthetic Assessment (IAA) is an crucial task in computer vision, aiming to quantify the aesthetic quality of images. Existing methods face two main challenges: neglecting the sequential modeling of human visual perception, and the fact that multi-attribute annotation is extremely time-consuming and labor-intensive. This letter proposes a novel Multi-stage IAA framework, leveraging Chain-of-Thought (CoT) reasoning and Multimodal Large Language Models (MLLMs). The framework designs a system that mimics the chain-like progression of human cognitive processing, with three dedicated modules: Low-level Stimulus Assessment (e.g., color harmony), Holistic Organizing Assessment (e.g., scene semantics), and High-level Perceiving Assessment (e.g., emotional resonance). These modules progressively analyze images, from basic visual features to high-level emotional understanding, by employing a step-by-step reasoning process characteristic of CoT. Furthermore, we explore an MLLM-oriented data transformation paradigm to convert multi-source IAA datasets into structured, Chain-of-Thought-compatible data, facilitating convenient attribute annotation. This enables the MLLMs to effectively learn and apply the CoT reasoning for aesthetic evaluation. Experiment on the PARA dataset achieves state-of-the-art results. Moreover, the framework exhibits generalization capabilities on the unseen FLICKR-AES dataset.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom