Open AccessGeneral-Purpose In-Context Learning by Meta-Learning TransformersOpen Access
Author(s)
Louis Kirsch,
James Harrison,
Jascha Sohl-Dickstein,
Luke Metz
Publication year2024
Modern machine learning requires system designers to specify aspects of thelearning pipeline, such as losses, architectures, and optimizers.Meta-learning, or learning-to-learn, instead aims to learn those aspects, andpromises to unlock greater capabilities with less manual effort. Oneparticularly ambitious goal of meta-learning is to train general-purposein-context learning algorithms from scratch, using only black-box models withminimal inductive bias. Such a model takes in training data, and producestest-set predictions across a wide range of problems, without any explicitdefinition of an inference model, training loss, or optimization algorithm. Inthis paper we show that Transformers and other black-box models can bemeta-trained to act as general-purpose in-context learners. We characterizetransitions between algorithms that generalize, algorithms that memorize, andalgorithms that fail to meta-train at all, induced by changes in model size,number of tasks, and meta-optimization. We further show that the capabilitiesof meta-trained algorithms are bottlenecked by the accessible state size(memory) determining the next prediction, unlike standard models which arethought to be bottlenecked by parameter count. Finally, we propose practicalinterventions such as biasing the training distribution that improve themeta-training and meta-generalization of general-purpose in-context learningalgorithms.
Language(s)English
Seeing content that should not be on Zendy? Contact us.
To access your conversation history and unlimited prompts, please
Prompt 0/10