Research Library

open-access-imgOpen AccessFlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs
Author(s)
Shulin Zeng,
Jun Liu,
Guohao Dai,
Xinhao Yang,
Tianyu Fu,
Hongyi Wang,
Wenheng Ma,
Hanbo Sun,
Shiyao Li,
Zixiao Huang,
Yadong Dai,
Jintao Li,
Zehao Wang,
Ruoyu Zhang,
Kairui Wen,
Xuefei Ning,
Yu Wang
Publication year2024
Transformer-based Large Language Models (LLMs) have made a significant impacton various domains. However, LLMs' efficiency suffers from both heavycomputation and memory overheads. Compression techniques like sparsificationand quantization are commonly used to mitigate the gap between LLM'scomputation/memory overheads and hardware capacity. However, existing GPU andtransformer-based accelerators cannot efficiently process compressed LLMs, dueto the following unresolved challenges: low computational efficiency,underutilized memory bandwidth, and large compilation overheads. This paper proposes FlightLLM, enabling efficient LLMs inference with acomplete mapping flow on FPGAs. In FlightLLM, we highlight an innovativesolution that the computation and memory overhead of LLMs can be solved byutilizing FPGA-specific resources (e.g., DSP48 and heterogeneous memoryhierarchy). We propose a configurable sparse DSP chain to support differentsparsity patterns with high computation efficiency. Second, we propose analways-on-chip decode scheme to boost memory bandwidth with mixed-precisionsupport. Finally, to make FlightLLM available for real-world LLMs, we propose alength adaptive compilation method to reduce the compilation overhead.Implemented on the Xilinx Alveo U280 FPGA, FlightLLM achieves 6.0$\times$higher energy efficiency and 1.8$\times$ better cost efficiency againstcommercial GPUs (e.g., NVIDIA V100S) on modern LLMs (e.g., LLaMA2-7B) usingvLLM and SmoothQuant under the batch size of one. FlightLLM beats NVIDIA A100GPU with 1.2$\times$ higher throughput using the latest Versal VHK158 FPGA.
Language(s)English

Seeing content that should not be on Zendy? Contact us.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here