Adaptive Semiparametric Language Models | Zendy

Dani Yogatama | Zendy; Cyprien de Masson d’Autume | Zendy; Lingpeng Kong | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Adaptive Semiparametric Language Models

Author(s) -

Dani Yogatama,

Cyprien de Masson d’Autume,

Lingpeng Kong

Publication year - 2021

Publication title -

transactions of the association for computational linguistics

Language(s) - English

Resource type - Journals

ISSN - 2307-387X

DOI - 10.1162/tacl_a_00371

Subject(s) - computer science , language model , artificial intelligence , parametric statistics , transformer , term (time) , natural language processing , machine learning , speech recognition , physics , mathematics , quantum mechanics , voltage , statistics

We present a language model that combines a large parametric neural network (i.e., a transformer) with a non-parametric episodic memory component in an integrated architecture. Our model uses extended short-term context by caching local hidden states—similar to transformer-XL—and global long-term memory by retrieving a set of nearest neighbor tokens at each timestep. We design a gating function to adaptively combine multiple information sources to make a prediction. This mechanism allows the model to use either local context, short-term memory, or long-term memory (or any combination of them) on an ad hoc basis depending on the context. Experiments on word-based and character-based language modeling datasets demonstrate the efficacy of our proposed method compared to strong baselines.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research