We introduce an open-source GPU-accelerated fully homomorphic encryption (FHE) framework CAT, which surpasses existing solutions in functionality and efficiency. \emph{CAT} features a three-layer architecture: a foundation of core math, a bridge of pre-computed elements and combined operations, and an API-accessible layer of FHE operators. It utilizes techniques such as parallel executed operations, well-defined layout patterns of cipher data, kernel fusion/segmentation, and dual GPU pools to enhance the overall execution efficiency. In addition, a memory management mechanism ensures server-side suitability and prevents data leakage. Based on our framework, we implement three widely used FHE schemes: CKKS, BFV, and BGV. The results show that our implementation on Nvidia 4090 can achieve up to 2173$\times$ speedup over CPU implementation and 1.25$\times$ over state-of-the-art GPU acceleration work for specific operations. What's more, we offer a scenario validation with CKKS-based Privacy Database Queries, achieving a 33$\times$ speedup over its CPU counterpart. All query tasks can handle datasets up to $10^3$ rows on a single GPU within 1 second, using 2-5 GB storage. Our implementation has undergone extensive stability testing and can be easily deployed on commercial GPUs. We hope that our work will significantly advance the integration of state-of-the-art FHE algorithms into diverse real-world systems by providing a robust, industry-ready, and open-source tool.
翻译:本文介绍了一种开源GPU加速全同态加密(FHE)框架CAT,其在功能与效率上均超越现有解决方案。CAT采用三层架构:核心数学运算基础层、预计算元素与组合运算桥接层、以及API可访问的FHE算子层。该框架通过并行执行操作、明确定义的密文数据布局模式、内核融合/分割及双GPU池等技术,全面提升执行效率。此外,其内存管理机制确保服务器端适用性并防止数据泄露。基于本框架,我们实现了三种广泛使用的FHE方案:CKKS、BFV和BGV。实验结果表明,在Nvidia 4090平台上,我们的实现针对特定运算相比CPU实现最高可获得2173倍加速,较当前最先进的GPU加速方案提升1.25倍。进一步地,我们通过基于CKKS的隐私数据库查询场景进行验证,相比CPU版本实现33倍加速。所有查询任务可在单GPU上1秒内处理高达$10^3$行数据集,仅需2-5GB存储空间。我们的实现经过广泛稳定性测试,可便捷部署于商用GPU。我们期望这项工作能通过提供稳健、工业可用且开源的工具,显著推动前沿FHE算法与多样化实际系统的融合。