Research Library

open-access-imgOpen AccessSample-based Dynamic Hierarchical Transformer with Layer and Head Flexibility via Contextual Bandit
Author(s)
Fanfei Meng,
Lele Zhang,
Yu Chen,
Yuxin Wang
Publication year2024
Transformer requires a fixed number of layers and heads which makes theminflexible to the complexity of individual samples and expensive in trainingand inference. To address this, we propose a sample-based Dynamic HierarchicalTransformer (DHT) model whose layers and heads can be dynamically configuredwith single data samples via solving contextual bandit problems. To determinethe number of layers and heads, we use the Uniform Confidence Bound while wedeploy combinatorial Thompson Sampling in order to select specific headcombinations given their number. Different from previous work that focuses oncompressing trained networks for inference only, DHT is not only advantageousfor adaptively optimizing the underlying network architecture during trainingbut also has a flexible network for efficient inference. To the best of ourknowledge, this is the first comprehensive data-driven dynamic transformerwithout any additional auxiliary neural networks that implement the dynamicsystem. According to the experiment results, we achieve up to 74% computationalsavings for both training and inference with a minimal loss of accuracy.
Language(s)English

Seeing content that should not be on Zendy? Contact us.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here