Improving generalization by mimicking the human visual diet | Zendy

Spandan Madan | Zendy; You Li | Zendy; Mengmi Zhang | Zendy; Hanspeter Pfister | Zendy; Gabriel Kreiman | Zendy

Research Library

ZAIA - AI Assistant About Blog Pricing Contact

Open AccessImproving generalization by mimicking the human visual diet

Open Access

Author(s)

Spandan Madan,

You Li,

Mengmi Zhang,

Hanspeter Pfister,

Gabriel Kreiman

Publication year2024

We present a new perspective on bridging the generalization gap betweenbiological and computer vision -- mimicking the human visual diet. Whilecomputer vision models rely on internet-scraped datasets, humans learn fromlimited 3D scenes under diverse real-world transformations with objects innatural context. Our results demonstrate that incorporating variations andcontextual cues ubiquitous in the human visual training data (visual diet)significantly improves generalization to real-world transformations such aslighting, viewpoint, and material changes. This improvement also extends togeneralizing from synthetic to real-world data -- all models trained with ahuman-like visual diet outperform specialized architectures by large marginswhen tested on natural image data. These experiments are enabled by our two keycontributions: a novel dataset capturing scene context and diverse real-worldtransformations to mimic the human visual diet, and a transformer modeltailored to leverage these aspects of the human visual diet. All data andsource code can be accessed athttps://github.com/Spandan-Madan/human_visual_diet.

Language(s)English

Seeing content that should not be on Zendy? Contact us.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore