Open Access
Low-Latency Keccak at any Arbitrary Order
Author(s) -
Sara Zarei,
Aein Rezaei Shahmirzadi,
Hadi Soleimany,
Raziyeh Salarifard,
Amir Moradi
Publication year - 2021
Publication title -
iacr transactions on cryptographic hardware and embedded systems
Language(s) - English
Resource type - Journals
ISSN - 2569-2925
DOI - 10.46586/tches.v2021.i4.388-411
Subject(s) - computer science , latency (audio) , masking (illustration) , implementation , glitch , cryptography , parallel computing , arithmetic , computer engineering , algorithm , mathematics , programming language , telecommunications , art , detector , visual arts
Correct application of masking on hardware implementation of cryptographic primitives necessitates the instantiation of registers in order to achieve the non-completeness (commonly said to stop the propagation of glitches). This sometimes leads to a high latency overhead, making the implementation not necessarily suitable for the underlying application. As a concrete example, this holds for Keccak. Application of d + 1 Domain Oriented Masking (DOM) on a round-based implementation of Keccak leads to the introduction of two register stages per round, i.e., two times higher latency. On the other hand, Rhythmic-Keccak, introduced in CHES 2018, unrolls two rounds to half the latency compared to an unprotected ordinary round-based implementation. To that end, td + 1 masking is used which requires a notable area, and – apart from the difficulty to construct – its extension to higher orders seems beyond the bounds of feasibility.In this paper, we focus on d + 1 masking and introduce a methodology which enables us to stay with the latency of an unprotected round-based implementation, i.e., one register stage per round. While being secure under glitch-extended probing model, we provide a general design where the desired security order can be easily adjusted without any effect on the above-given latency. Compared to the Rhythmic-Keccak, the synthesis results show that our first-order design is able to accomplish the entire operations of Keccak-f[200] in the same period of time while decreasing the area by 74.5%. Notably, our implementations achieve around 30% less delay compared to the corresponding original DOM-Keccak designs.