Open AccessImproving generalization by mimicking the human visual dietOpen Access
Author(s)
Spandan Madan,
You Li,
Mengmi Zhang,
Hanspeter Pfister,
Gabriel Kreiman
Publication year2024
We present a new perspective on bridging the generalization gap betweenbiological and computer vision -- mimicking the human visual diet. Whilecomputer vision models rely on internet-scraped datasets, humans learn fromlimited 3D scenes under diverse real-world transformations with objects innatural context. Our results demonstrate that incorporating variations andcontextual cues ubiquitous in the human visual training data (visual diet)significantly improves generalization to real-world transformations such aslighting, viewpoint, and material changes. This improvement also extends togeneralizing from synthetic to real-world data -- all models trained with ahuman-like visual diet outperform specialized architectures by large marginswhen tested on natural image data. These experiments are enabled by our two keycontributions: a novel dataset capturing scene context and diverse real-worldtransformations to mimic the human visual diet, and a transformer modeltailored to leverage these aspects of the human visual diet. All data andsource code can be accessed athttps://github.com/Spandan-Madan/human_visual_diet.
Language(s)English
Seeing content that should not be on Zendy? Contact us.
To access your conversation history and unlimited prompts, please
Prompt 0/10