The Ring-Learning With Errors (RLWE) problem forms the backbone of highly efficient Fully Homomorphic Encryption (FHE) schemes. A significant component of the RLWE public key and ciphertext of the form $(b,a)$ is the uniformly random polynomial $a \in R_q$ . While essential for security, the communication overhead of transmitting $a$ from client to server, and inputting it into a hardware accelerator, can be substantial, especially for FHE accelerators aiming at high acceleration factors. A known technique in reducing this overhead generates $a$ from a small seed on the client side via a deterministic process, transmits only the seed, and generates $a$ on-the-fly within the accelerator. Challenges in the hardware implementation of an accelerator include wiring (density and power), compute area, compute power as well as flexibility in scheduling of on-the-fly generation instructions. This extended abstract proposes a concrete scheme and parameters wherein these practical challenges are addressed. We detail the benefits of our approach, which maintains the reduction in communication latency and memory footprint, while allowing parallel generation of uniformly distributed samples, relaxed wiring requirements, unrestricted randomaccess to RNS limbs, and results in an extremely low overhead on the client side (i.e. less than 3%) during the key generation process. The proposed scheme eliminates the need for thick metal layers for randomness distribution and prevents the power consumption of the PRNG subsystem from scaling prohibitively with the acceleration factor, potentially saving tens of Watts per accelerator chip in high-throughput configurations.
翻译:环上带误差学习(RLWE)问题是高效全同态加密(FHE)方案的核心基础。RLWE公钥与形如$(b,a)$的密文中的关键组成部分是均匀随机多项式$a \in R_q$。虽然这对安全性至关重要,但将$a$从客户端传输至服务器并输入硬件加速器的通信开销可能非常巨大,特别是对于追求高加速比的FHE加速器而言。一种已知的降低该开销的技术是:在客户端通过确定性过程从小种子生成$a$,仅传输种子,并在加速器内部动态生成$a$。加速器硬件实现中的挑战包括布线(密度与功耗)、计算面积、计算功耗以及动态生成指令调度的灵活性。本扩展摘要提出了一种具体方案与参数集以应对这些实际挑战。我们详细阐述了本方法的优势:在保持通信延迟与内存占用降低的同时,支持均匀分布样本的并行生成、放宽的布线要求、对RNS分量的无限制随机访问,并在密钥生成过程中为客户端带来极低开销(即低于3%)。所提方案无需为随机性分配使用厚金属层,并防止伪随机数生成子系统的功耗随加速比呈禁止性增长,在高吞吐量配置下可为每个加速器芯片节省数十瓦功耗。