PackVFL: Efficient HE Packing for Vertical Federated Learning

As an essential tool of secure distributed machine learning, vertical federated learning (VFL) based on homomorphic encryption (HE) suffers from severe efficiency problems due to data inflation and time-consuming operations. To this core, we propose PackVFL, an efficient VFL framework based on packed HE (PackedHE), to accelerate the existing HE-based VFL algorithms. PackVFL packs multiple cleartexts into one ciphertext and supports single-instruction-multiple-data (SIMD)-style parallelism. We focus on designing a high-performant matrix multiplication (MatMult) method since it takes up most of the ciphertext computation time in HE-based VFL. Besides, devising the MatMult method is also challenging for PackedHE because a slight difference in the packing way could predominantly affect its computation and communication costs. Without domain-specific design, directly applying SOTA MatMult methods is hard to achieve optimal. Therefore, we make a three-fold design: 1) we systematically explore the current design space of MatMult and quantify the complexity of existing approaches to provide guidance; 2) we propose a hybrid MatMult method according to the unique characteristics of VFL; 3) we adaptively apply our hybrid method in representative VFL algorithms, leveraging distinctive algorithmic properties to further improve efficiency. As the batch size, feature dimension and model size of VFL scale up to large sizes, PackVFL consistently delivers enhanced performance. Empirically, PackVFL propels existing VFL algorithms to new heights, achieving up to a 51.52X end-to-end speedup. This represents a substantial 34.51X greater speedup compared to the direct application of SOTA MatMult methods.

翻译：作为安全分布式机器学习的重要工具，基于同态加密（HE）的纵向联邦学习（VFL）因数据膨胀与高耗时操作导致严重效率问题。针对这一核心挑战，本文提出PackVFL——一种基于打包同态加密（PackedHE）的高效VFL框架，旨在加速现有基于HE的VFL算法。PackVFL将多个明文打包至单个密文，支持单指令多数据（SIMD）式并行运算。由于矩阵乘法（MatMult）占据HE-VFL中绝大部分密文计算时间，我们着重设计高性能矩阵乘法方法。此外，PackedHE场景下的MatMult设计极具挑战性，因为打包方式的细微差异会显著影响其计算与通信成本。若不进行领域特定优化，直接应用最先进的MatMult方法难以达到最优效果。为此，我们提出三重设计：1）系统探索当前MatMult设计空间，量化现有方法的复杂度以提供指导；2）根据VFL的独特特性提出混合型MatMult方法；3）结合代表性VFL算法的特有算法性质，自适应应用混合方法以进一步提升效率。当VFL的批处理规模、特征维度和模型尺寸扩展至大规模时，PackVFL始终展现出增强的性能。实验表明，PackVFL将现有VFL算法性能推至新高度，实现高达51.52倍端到端加速，相比直接应用最先进MatMult方法额外获得34.51倍的显著加速提升。