Research Library

open-access-imgOpen AccessGeneral-Purpose In-Context Learning by Meta-Learning Transformers
Author(s)
Louis Kirsch,
James Harrison,
Jascha Sohl-Dickstein,
Luke Metz
Publication year2024
Modern machine learning requires system designers to specify aspects of thelearning pipeline, such as losses, architectures, and optimizers.Meta-learning, or learning-to-learn, instead aims to learn those aspects, andpromises to unlock greater capabilities with less manual effort. Oneparticularly ambitious goal of meta-learning is to train general-purposein-context learning algorithms from scratch, using only black-box models withminimal inductive bias. Such a model takes in training data, and producestest-set predictions across a wide range of problems, without any explicitdefinition of an inference model, training loss, or optimization algorithm. Inthis paper we show that Transformers and other black-box models can bemeta-trained to act as general-purpose in-context learners. We characterizetransitions between algorithms that generalize, algorithms that memorize, andalgorithms that fail to meta-train at all, induced by changes in model size,number of tasks, and meta-optimization. We further show that the capabilitiesof meta-trained algorithms are bottlenecked by the accessible state size(memory) determining the next prediction, unlike standard models which arethought to be bottlenecked by parameter count. Finally, we propose practicalinterventions such as biasing the training distribution that improve themeta-training and meta-generalization of general-purpose in-context learningalgorithms.
Language(s)English

Seeing content that should not be on Zendy? Contact us.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here