Computing on encrypted data is a promising approach to reduce data security and privacy risks, with homomorphic encryption serving as a facilitator in achieving this goal. In this work, we accelerate homomorphic operations using the Processing-in- Memory (PIM) paradigm to mitigate the large memory capacity and frequent data movement requirements. Using a real-world PIM system, we accelerate the Brakerski-Fan-Vercauteren (BFV) scheme for homomorphic addition and multiplication. We evaluate the PIM implementations of these homomorphic operations with statistical workloads (arithmetic mean, variance, linear regression) and compare to CPU and GPU implementations. Our results demonstrate 50-100x speedup with a real PIM system (UPMEM) over the CPU and 2-15x over the GPU in vector addition. For vector multiplication, the real PIM system outperforms the CPU by 40-50x. However, it lags 10-15x behind the GPU due to the lack of native sufficiently wide multiplication support in the evaluated first-generation real PIM system. For mean, variance, and linear regression, the real PIM system performance improvements vary between 30x and 300x over the CPU and between 10x and 30x over the GPU, uncovering real PIM system trade-offs in terms of scalability of homomorphic operations for varying amounts of data. We plan to make our implementation open-source in the future.
翻译:对加密数据进行计算是降低数据安全与隐私风险的有效途径,而同态加密技术是实现该目标的关键支撑。本研究采用处理-内存(PIM)范式加速同态操作,以缓解大规模内存容量需求和频繁数据移动带来的挑战。基于实际PIM系统,我们针对Brakerski-Fan-Vercauteren(BFV)方案的同态加法与乘法运算进行了加速。通过统计工作负载(算术均值、方差、线性回归)评估这些同态操作的PIM实现,并与CPU及GPU实现进行对比。实验结果显示:在向量加法中,实际PIM系统(UPMEM)相比CPU实现50-100倍加速,相比GPU实现2-15倍加速;在向量乘法中,实际PIM系统相比CPU实现40-50倍加速,但由于评估的第一代实际PIM系统缺乏原生宽位乘法支持,其性能落后GPU约10-15倍。对于均值、方差和线性回归计算,实际PIM系统相较CPU实现30-300倍性能提升,相较GPU实现10-30倍性能提升,揭示了实际PIM系统在处理不同数据量时同态操作可扩展性存在的权衡关系。我们计划未来将实现代码开源。