TY - JOUR
T1 - Area-Time-Efficient Scalable Schoolbook Polynomial Multiplier for Lattice-Based Cryptography
AU - Birgani, Yahya Arzani
AU - Timarchi, Somayyeh
AU - Khalid, Ayesha
N1 - © 2022, IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. This is the accepted manuscript version of a conference paper which has been published in final form at https://doi.org/10.1109/TCSII.2022.3188943
PY - 2022/12/1
Y1 - 2022/12/1
N2 - Lattice-based cryptography (LBC) stands out as one of the most viable classes of quantum-resistant schemes. This brief explores a time-sharing approach, with different parallelism levels, for a crucial operation in LBC cryptosystems, i.e., polynomial multiplication. We also employ an innovative coefficient ordering method in our time-shared schoolbook polynomial multiplication (SPM) to combine the best of two worlds: design compactness and lower processing latency. Thus, our work offers a choice of design points with performance vs. resource trade-offs. Our fastest proposed design exhibits 80% and 57% reductions in LUTs and throughput, respectively, compared to the existing fully parallel SPM architecture (on Xilinx Ultrascale+), which lead to a 53% improvement in the area-time-product efficiency. Our smallest proposed design is more than $2.2\times $ faster than the existing low-cost parallel SPM architecture (on Xilinx Kintex-7) at the expense of 85% additional area resources.
AB - Lattice-based cryptography (LBC) stands out as one of the most viable classes of quantum-resistant schemes. This brief explores a time-sharing approach, with different parallelism levels, for a crucial operation in LBC cryptosystems, i.e., polynomial multiplication. We also employ an innovative coefficient ordering method in our time-shared schoolbook polynomial multiplication (SPM) to combine the best of two worlds: design compactness and lower processing latency. Thus, our work offers a choice of design points with performance vs. resource trade-offs. Our fastest proposed design exhibits 80% and 57% reductions in LUTs and throughput, respectively, compared to the existing fully parallel SPM architecture (on Xilinx Ultrascale+), which lead to a 53% improvement in the area-time-product efficiency. Our smallest proposed design is more than $2.2\times $ faster than the existing low-cost parallel SPM architecture (on Xilinx Kintex-7) at the expense of 85% additional area resources.
KW - Computer architecture
KW - Cryptography
KW - Random access memory
KW - Pipeline processing
KW - DH-HEMTs
KW - Costs
UR - https://ieeexplore.ieee.org/document/9816112/
U2 - 10.1109/TCSII.2022.3188943
DO - 10.1109/TCSII.2022.3188943
M3 - Article
SN - 1549-7747
VL - 69
SP - 5079
EP - 5083
JO - IEEE Transactions on Circuits and Systems II: Express Briefs
JF - IEEE Transactions on Circuits and Systems II: Express Briefs
IS - 12
M1 - 9816112
ER -