Efficient and retargetable SIMD translation in a dynamic binary translator | Zendy

Fu ShengYu | Zendy; Hong DingYong | Zendy; Liu YuPing | Zendy; Wu JanJan | Zendy; Hsu WeiChung | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Efficient and retargetable SIMD translation in a dynamic binary translator

Author(s) -

Fu ShengYu,

Hong DingYong,

Liu YuPing,

Wu JanJan,

Hsu WeiChung

Publication year - 2018

Publication title -

software: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.437

H-Index - 70

eISSN - 1097-024X

pISSN - 0038-0644

DOI - 10.1002/spe.2573

Subject(s) - simd , computer science , parallel computing , binary translation , x86 , scalability , operating system , software

Summary The single‐instruction multiple‐data (SIMD) computing capability of modern processors is continually improved to deliver ever better performance and power efficiency. For example, Intel has increased SIMD register lengths from 128 bits in streaming SIMD extension to 512 bits in AVX‐512. The ARM scalable vector extension supports SIMD register length up to 2048 bits and includes predicated instructions. However, SIMD instruction translation in dynamic binary translation has not received similar attention. For example, the widely used QEMU emulates guest SIMD instructions with a sequence of scalar instructions, even when the host machines have relevant SIMD instructions. This leaves significant potential for performance enhancement. We propose a newly designed SIMD translation framework for dynamic binary translation, which takes advantage of the host's SIMD capabilities. The proposed framework has been built in HQEMU, an enhanced QEMU with a separate thread for applying LLVM optimizations. The current prototype supports ARMv7, ARMv8, and IA32 guests on the X86‐64 AVX‐2 host. Compared with the scalar‐translation version HQEMU, our framework runs up to 1.84 times faster on Standard Performance Evaluation Corporation 2006 CFP benchmarks and up to 6.81 times faster on selected real applications.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research