z-logo
open-access-imgOpen Access
Design and Implementation of Cache Memory with Dual Unit Tile/Line Accessibility
Author(s) -
Baokang Wang
Publication year - 2019
Publication title -
mathematical problems in engineering
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.262
H-Index - 62
eISSN - 1026-7077
pISSN - 1024-123X
DOI - 10.1155/2019/9601961
Subject(s) - computer science , cache , parallel computing , cpu cache , translation lookaside buffer , cache coloring , page cache , cache pollution , cache algorithms , computer hardware , embedded system , physical address , semiconductor memory
In recent years, the increasing disparity between the data access speed of cache and processing speeds of processors has caused a major bottleneck in achieving high-performance 2-dimensional (2D) data processing, such as that in scientific computing and image processing. To solve this problem, this paper proposes new dual unit tile/line access cache memory based on a hierarchical hybrid Z-ordering data layout and multibank cache organization supporting skewed storage schemes. The proposed layout improves 2D data locality and reduces L1 cache misses and Translation Lookaside Buffer (TLB) misses efficiently and it is transformed from conventional raster layout by a simple hardware-based address translation unit. In addition, we proposed an aligned tile set replacement algorithm (ATSRA) for reduction of the hardware overhead in the tag memory of the proposed cache. Simulation results using Matrix Multiplication (MM) illustrated that the proposed cache with parallel unit tile/line accessibility can reduce both the L1 cache and TLB misses considerably as compared with conventional raster layout and Z-Morton order layout. The number of parallel load instructions for parallel unit tile/line access was reduced to only about one-fourth of the conventional load instruction. The execution time for parallel load instruction was reduced to about one-third of that required for conventional load instruction. By using 40 nm Complementary Metal-Oxide-Semiconductor (CMOS) technology, we combined the proposed cache with a SIMD-based data path and designed a 5 × 5 mm 2 Large-Scale Integration (LSI) chip. The entire hardware overhead of the proposed ATSRA-cache was reduced to only 105% of that required for a conventional cache by using the ATSRA method.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here