Code Compression for VLIW Processors
Author(s) -
Yuan Xie,
Haris Lekatsas,
Wayne H. Wolf
Publication year - 2001
Language(s) - English
DOI - 10.1109/dcc.2001.10049
Code compression is an important issue in the design of an embedded system, since memory has been one of the most restricted resources. Most of the previous work in code compression has targeted RISC architectures, although VLIW processors have gained a lot of popularity recently. In this research, we explore methods to the problem of compressing code for VLIW processors. Previous code compression algorithms for VLIW are dictionary-based schemes and target traditional VLIW architectures, which have rigid instruction word formats. Based on an arithmetic coding algorithm, we present several compression schemes for modern VLIW architectures, which have very flexible instruction format to achieve code density. The compression algorithm we used in our research is a reduced-precision arithmetic coder in combination with a Markov model[1]. We investigated several modern VLIW architectures, selecting Texas Instrument’s TMS320C6x and Lucent’s StarCore DSP SC140, as well as Intel/HP’s IA-64. Modern VLIW ISAs adapt a VLES (various length execution set) idea to achieve high code density for minimized cost. There are two kinds of instruction packets in the code. One is called fetch packet and the other is called execute packet. A fetch packet consists of a fixed number of instructions that are always fetched together at a time, while all instructions that execute in parallel constitute an execute packet. Based on the characteristics of modern VLIW ISA’s instruction format, we propose two schemes to improve the compression ratio. The first is to increase the compression block size to be the size of fetch packet; the second is multiple-model scheme, which works for VLIW ISAs that have more restricted instruction format, such as IA-64. Basically, the multiplemodel scheme constructs different models for different parts of the long instruction word. Fast decompression is particularly important for VLIW machines. Because instruction words are long, existing code compression algorithms introduce very long delays into instruction execution during decompression. Thus we also explore the trade-offs between compression ratio and decompression speed. In order to enable parallel decompression for each fetch packet, we divide the whole fetch packet into sub-block and attach tag in front of each compressed sub-block to indicate the size of current compressed sub-block, such that the decompresor knows the location of the next compressed sub-block. Another scheme is to take a vertical compression approach. All the fetch packets in the code are divided to be several streams. We compress each stream in parallel. Since each stream has its own LAT table, which records each compressed block’s position, we don’t need to attach a tag to each compressed block. This is actually also a multi-model approach since each stream has its own statistical model. Both schemes sacrifice the compression ratio for a faster decompression speed. Reference: [1] Lekatsas and Wolf, “ SAMC: A code compression algorithm for embedded processors”, IEEE Transactions on CAD, December 1999. Yuan Xie, Haris Lekatsas, Wayne Wolf Electrical Engineering Department NEC USA, Princeton University 4 Independent way Princeton, NJ 08540 Princeton,NJ 08540 {yuanxie,wolf}@ee.princeton.edu lekatsas@ccrl.nj.nec.com
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom