Research Library

open-access-imgOpen AccessGetting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Author(s)
Ibrahim Alabdulmohsin,
Xiaohua Zhai,
Alexander Kolesnikov,
Lucas Beyer
Publication year2024
Scaling laws have been recently employed to derive compute-optimal model size(number of parameters) for a given compute duration. We advance and refine suchmethods to infer compute-optimal model shapes, such as width and depth, andsuccessfully implement this in vision transformers. Our shape-optimized visiontransformer, SoViT, achieves results competitive with models that exceed twiceits size, despite being pre-trained with an equivalent amount of compute. Forexample, SoViT-400m/14 achieves 90.3% fine-tuning accuracy on ILSRCV2012,surpassing the much larger ViT-g/14 and approaching ViT-G/14 under identicalsettings, with also less than half the inference cost. We conduct a thoroughevaluation across multiple tasks, such as image classification, captioning, VQAand zero-shot transfer, demonstrating the effectiveness of our model across abroad range of domains and identifying limitations. Overall, our findingschallenge the prevailing approach of blindly scaling up vision models and pavea path for a more informed scaling.
Language(s)English

Seeing content that should not be on Zendy? Contact us.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here