Multi-stage Image Aesthetic Assessment via Chain-of-Thought Reasoning | Zendy

Zhenglang Jiang | Zendy; Jianhao Liu | Zendy; Hongming Li | Zendy; Yue Liu | Zendy; Yu Song | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Multi-stage Image Aesthetic Assessment via Chain-of-Thought Reasoning

Author(s) -

Zhenglang Jiang,

Jianhao Liu,

Hongming Li,

Yue Liu,

Yu Song

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3590925

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Image Aesthetic Assessment (IAA) is an crucial task in computer vision, aiming to quantify the aesthetic quality of images. Existing methods face two main challenges: neglecting the sequential modeling of human visual perception, and the fact that multi-attribute annotation is extremely time-consuming and labor-intensive. This letter proposes a novel Multi-stage IAA framework, leveraging Chain-of-Thought (CoT) reasoning and Multimodal Large Language Models (MLLMs). The framework designs a system that mimics the chain-like progression of human cognitive processing, with three dedicated modules: Low-level Stimulus Assessment (e.g., color harmony), Holistic Organizing Assessment (e.g., scene semantics), and High-level Perceiving Assessment (e.g., emotional resonance). These modules progressively analyze images, from basic visual features to high-level emotional understanding, by employing a step-by-step reasoning process characteristic of CoT. Furthermore, we explore an MLLM-oriented data transformation paradigm to convert multi-source IAA datasets into structured, Chain-of-Thought-compatible data, facilitating convenient attribute annotation. This enables the MLLMs to effectively learn and apply the CoT reasoning for aesthetic evaluation. Experiment on the PARA dataset achieves state-of-the-art results. Moreover, the framework exhibits generalization capabilities on the unseen FLICKR-AES dataset.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research