Adapting AlphaEvolve to Optimize Fully Homomorphic Encryption on TPUs

The deployment of Fully Homomorphic Encryption (FHE) at scale is hindered due to its heavy computational overhead. While specialized hardware accelerators like Google Tensor Processing Units (TPUs) can help, mapping complex cryptographic kernels onto such architectures remains a challenge. Efficient execution requires co-optimization between the systolic array-based Matrix Multiplication Unit (MXU) and Vector Processing Units (VPUs), as well as the orchestration of data movement across the vector register files. Existing compiler stacks often abstract low-level hardware utilization, requiring developers to adopt a manual trial-and-error process that often results in fragmented execution and underutilized resources. To accelerate this development process, we use AlphaEvolve to automate the exploration of hardware-aware cryptographic-kernel optimizations. We frame optimization as an evolutionary search problem, utilizing the closed-loop system provided by AlphaEvolve, that leverages LLM-driven code generation. We use real-world feedback from hardware execution and rigorous correctness testing to guide the evolution process. We evaluate AlphaEvolve optimization on primitives for both the TFHE (Jaxite) and CKKS (CROSS) FHE schemes on Google Cloud TPUv5e, a contemporary TPU architecture. Within 24 hours of automated exploration, AlphaEvolve discovered implementation-level optimizations that improve TFHE bootstrap latency by 2.5x and CKKS rotation and multiplication latency by 1.31x and 1.18x, respectively, relative to human-engineered state of the art. These results demonstrate that AlphaEvolve can be used to enable researchers to navigate the optimization trade-offs between cryptography, compilers, and hardware accelerators.

翻译：大规模部署全同态加密（FHE）受限于其极高的计算开销。尽管谷歌张量处理单元（TPU）等专用硬件加速器可缓解这一问题，但将复杂密码学内核映射至此类架构仍具挑战。高效执行需要协同优化基于脉动阵列的矩阵乘法单元（MXU）与向量处理单元（VPU），并精心编排向量寄存器文件间的数据移动。现有编译器栈常将底层硬件利用率抽象化，迫使开发者采用人工试错流程，导致执行碎片化与资源利用率不足。为加速开发进程，我们利用AlphaEvolve实现硬件感知型密码学内核优化的自动探索。我们将优化问题重构为进化搜索，借助AlphaEvolve提供的闭环系统，通过大语言模型驱动代码生成。我们采用硬件执行的真实反馈与严格正确性测试指导进化过程。在谷歌云TPUv5e（当代TPU架构）上，针对TFHE（Jaxite）与CKKS（CROSS）两种FHE方案的基元，评估了AlphaEvolve的优化效果。经过24小时自动探索，AlphaEvolve发现的实现级优化将TFHE引导延迟提升2.5倍，将CKKS旋转与乘法延迟分别提升1.31倍与1.18倍（与人工优化的最新技术相比）。结果表明，AlphaEvolve可助力研究人员在密码学、编译器与硬件加速器之间导航优化权衡。