z-logo
open-access-imgOpen Access
Novel Approximate Floating-point Adder with runtime tunable precision
Author(s) -
L. Tegazzini,
G. Di Meo,
A. Torino,
F. Del Prete,
D. De Caro,
C. Parrella,
A. G. M. Strollo
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3621071
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
This paper proposes a novel low-power approximate floating-point adder with runtime configurable precision. The design combines skipping addition, triggered when the exponent difference exceeds a programmable threshold, with tunable mantissa rounding to reduce switching activity. We present a comprehensive analysis aimed at identifying the most appropriate combination of skipping and rounding thresholds that enables the best balance between precision and power consumption. The thresholds are controlled by an accuracy control signal, which can be modified at runtime to achieve the desired power-precision trade-off. A detailed comparison with state-of-the-art designs was carried out in FP32 and FP16 formats, considering implementations in 28nm technology. The proposed approximate floating-point adder outperforms existing approaches in the power-precision trade-off, while also offering the unique advantage of runtime configurability. This flexibility comes at the cost of a moderate area increase, mainly due to the additional control logic for configurability. Power savings over the exact adder range from 45% to 60% in FP32 and from 20% to 53% in FP16, with the mean relative error distance tunable between 1.34×10 −8 and 6.34×10 −6 in the FP32 and from 5.07×10 −4 to 1.78×10 −2 in FP16. The effectiveness of our configurable adder is validated in three practical applications: JPEG image compression, convolution-based image processing (filtering and edge detection), and time sequence prediction using a Long Short-Term Memory recurrent neural network. In all the tests the proposed adder consistently delivered string performance, confirming the practical relevance of the proposed technique.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom