z-logo
open-access-imgOpen Access
Comparative Perspective of Visual Attention: From Human Focus to Visual Transformers. An In-Depth Review
Author(s) -
Luis Oliveros,
Miguel Carrasco,
Jose Aranda,
Cesar Gonzalez-Martin
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3615404
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Although neuroscience has made considerable progress in recent decades by proposing robust models that explain the mechanisms of attention and perception in humans, emulating this capability using computational techniques remains complex. It was not until the development of models such as Visual Transformers (ViT) that it became possible to partially replicate this uniquely human trait. The main objective of this study was to explore the extent to which attention models, such as ViT, can reproduce the manner in which people distribute their visual attention when exposed to various stimuli, particularly in the context of handcrafted objects. Human fixations (i.e., attention) were recorded using an eye tracker, while the ViT model processed the same images to generate attention maps to evaluate the degree of similarity between the two patterns. For this purpose, heatmaps were constructed, and quantitative metrics were applied to assess their similarity. The results revealed areas of convergence and significant differences, highlighting the current limitations of computational models in capturing the more subtle aspects of human perception. This comparison not only helps us better understand the capabilities of ViT but also provides a foundation for reflecting on future improvements in automated attention models and their potential applications in contexts where visual interpretation is crucial.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom