Sample-based Dynamic Hierarchical Transformer with Layer and Head  Flexibility via Contextual Bandit | Zendy

Fanfei Meng | Zendy; Lele Zhang | Zendy; Yu Chen | Zendy; Yuxin Wang | Zendy

Research Library

ZAIA - AI Assistant About Blog Pricing Contact

Open AccessSample-based Dynamic Hierarchical Transformer with Layer and Head Flexibility via Contextual Bandit

Open Access

Author(s)

Fanfei Meng,

Lele Zhang,

Yu Chen,

Yuxin Wang

Publication year2024

Transformer requires a fixed number of layers and heads which makes theminflexible to the complexity of individual samples and expensive in trainingand inference. To address this, we propose a sample-based Dynamic HierarchicalTransformer (DHT) model whose layers and heads can be dynamically configuredwith single data samples via solving contextual bandit problems. To determinethe number of layers and heads, we use the Uniform Confidence Bound while wedeploy combinatorial Thompson Sampling in order to select specific headcombinations given their number. Different from previous work that focuses oncompressing trained networks for inference only, DHT is not only advantageousfor adaptively optimizing the underlying network architecture during trainingbut also has a flexible network for efficient inference. To the best of ourknowledge, this is the first comprehensive data-driven dynamic transformerwithout any additional auxiliary neural networks that implement the dynamicsystem. According to the experiment results, we achieve up to 74% computationalsavings for both training and inference with a minimal loss of accuracy.

Language(s)English

Seeing content that should not be on Zendy? Contact us.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore