Premium
TMA: Tera‐MACs/W neural hardware inference accelerator with a multiplier‐less massive parallel processor
Author(s) -
Park Hyunbin,
Kim Dohyun,
Kim Shiho
Publication year - 2021
Publication title -
international journal of circuit theory and applications
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.364
H-Index - 52
eISSN - 1097-007X
pISSN - 0098-9886
DOI - 10.1002/cta.2917
Subject(s) - computer science , artificial neural network , hardware acceleration , benchmark (surveying) , parallel computing , scalability , multiplier (economics) , field programmable gate array , computer hardware , tera , virtex , efficient energy use , computer engineering , embedded system , artificial intelligence , operating system , geodesy , engineering , electrical engineering , economics , macroeconomics , geography
Summary Computationally intensive inference tasks of deep neural networks have brought about a revolution in accelerator architecture, aiming to reduce power consumption as well as latency. The key figure‐of‐merit in hardware inference accelerators is the number of multiply‐and‐accumulation operations per watt (MACs/W); the state‐of‐ the‐art MACs/W, so far, has been several hundreds Giga‐MACs/W. We propose a Tera‐ MACS/W neural hardware inference accelerator (TMA) with 8‐bit activations and scalable integer weights less than 1‐byte. The architecture's main feature is a configurable neural processing element for matrix‐vector operations. The proposed neural processing element uses a multiplier‐less massive parallel processor that works without multipliers, which makes it attractive for energy efficient high‐performance neural network applications. We benchmark our system's latency, power, and performance using Alexnet trained on ImageNet. Finally, we compared our accelerator's throughput and power consumption to that of the prior works. The proposed accelerator outperforms the state‐of‐the‐art counterparts, in terms of the energy and area efficiency, achieving 2.3 TMACs/W@1.0 V on a 28‐nm Virtex‐7 FPGA chip.