LRD-MPC: Efficient MPC Inference through Low-rank Decomposition

Secure Multi-party Computation (MPC) enables untrusted parties to jointly compute a function without revealing their inputs. Its application to machine learning (ML) has gained significant attention, particularly for secure inference services deployed across multiple cloud virtual machines (VMs), where each VM acts as an MPC party. Model providers secret-share model weights, and users secret-share inputs, ensuring that each server operates only on random shares. While MPC provides strong cryptographic guarantees, it incurs substantial computational and communication overhead. Deep neural networks rely heavily on convolutional and fully connected layers, which require costly matrix multiplications in MPC. To reduce this cost, we propose leveraging low-rank decomposition (LRD) for linear layers, replacing one large matrix multiplication with two smaller ones. Each matrix multiplication in MPC incurs a round of communication, meaning decomposing one matrix multiplication into two leads to an additional communication round. Second, the added matrix multiplication requires an additional truncation step to maintain numerical precision. Since truncation itself requires communication and computation, these overheads can offset the gains from decomposition. To address this, we introduce two complementary optimizations: truncation skipping and efficient linear layer concatenation. Truncation skipping removes the extra truncation induced by LRD, while linear layer concatenation pipelines operations to hide the additional communication round. Together, these techniques mitigate the main overheads of LRD in MPC and improve overall efficiency. Our approach is broadly applicable across MPC protocols. Experiments show up to 25% speedup in n-PC and 33% in 3-PC protocols over full-rank baselines, along with up to 52% GPU energy savings and 88% reduction in offline-phase latency.

翻译：安全多方计算（Secure Multi-party Computation, MPC）使得互不信任的参与方能够在不泄露各自输入的情况下协同计算某个函数。其在机器学习（ML）领域的应用已受到广泛关注，尤其适用于部署在多个云虚拟机（VM）上的安全推理服务，其中每个虚拟机充当一个MPC参与方。模型提供者将模型权重进行秘密共享，用户将输入进行秘密共享，从而确保每个服务器仅操作随机份额。尽管MPC提供了强大的密码学保证，但其带来了显著的计算与通信开销。深度神经网络严重依赖卷积层和全连接层，这些层在MPC中需要进行代价高昂的矩阵乘法运算。为降低此成本，我们提出利用低秩分解（Low-rank Decomposition, LRD）处理线性层，将一次大型矩阵乘法替换为两次较小的矩阵乘法。在MPC中，每次矩阵乘法均涉及一轮通信，这意味着将一次矩阵乘法分解为两次会引入额外的通信轮次。其次，新增的矩阵乘法需要一个额外的截断步骤以维持数值精度。由于截断本身也需要通信和计算，这些开销可能抵消分解带来的收益。为解决此问题，我们引入了两种互补的优化技术：截断跳过与高效的线性层拼接。截断跳过消除了LRD引入的额外截断操作，而线性层拼接则通过流水线化操作来隐藏额外的通信轮次。这些技术共同缓解了LRD在MPC中的主要开销，提升了整体效率。我们的方法广泛适用于多种MPC协议。实验表明，与全秩基线相比，在n-PC协议中实现了高达25%的加速，在3-PC协议中实现了高达33%的加速，同时GPU能耗最高可节省52%，离线阶段延迟最高可降低88%。