Research Library

open-access-imgOpen AccessImproving generalization by mimicking the human visual diet
Author(s)
Spandan Madan,
You Li,
Mengmi Zhang,
Hanspeter Pfister,
Gabriel Kreiman
Publication year2024
We present a new perspective on bridging the generalization gap betweenbiological and computer vision -- mimicking the human visual diet. Whilecomputer vision models rely on internet-scraped datasets, humans learn fromlimited 3D scenes under diverse real-world transformations with objects innatural context. Our results demonstrate that incorporating variations andcontextual cues ubiquitous in the human visual training data (visual diet)significantly improves generalization to real-world transformations such aslighting, viewpoint, and material changes. This improvement also extends togeneralizing from synthetic to real-world data -- all models trained with ahuman-like visual diet outperform specialized architectures by large marginswhen tested on natural image data. These experiments are enabled by our two keycontributions: a novel dataset capturing scene context and diverse real-worldtransformations to mimic the human visual diet, and a transformer modeltailored to leverage these aspects of the human visual diet. All data andsource code can be accessed athttps://github.com/Spandan-Madan/human_visual_diet.
Language(s)English

Seeing content that should not be on Zendy? Contact us.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here