To preserve data privacy, multi-party computation (MPC) enables executing Machine Learning (ML) algorithms on private data. However, MPC frameworks do not include optimized operations on sparse data. This absence makes them unsuitable for ML applications involving sparse data; e.g., recommender systems or genomics. Even in plaintext, such applications involve high-dimensional sparse data, that cannot be processed without sparsity-related optimizations due to prohibitively large memory requirements. Since matrix multiplication is a central building block of ML algorithms, our work proposes dedicated MPC algorithms to multiply secret-shared sparse matrices. Our sparse algorithms have several advantages over secure dense matrix multiplications (i.e., the classic multiplication). On the one hand, they avoid the memory issues caused by the "dense" data representation of dense multiplications. On the other hand, our algorithms can significantly reduce communication costs (up to $\times1000$) for realistic problem sizes. We validate our algorithms in two machine learning applications where dense matrix multiplications are impractical. Finally, we take inspiration from real-world sparse data properties to build 3 techniques minimizing the public knowledge necessary to secure sparse algorithms.
翻译:为保护数据隐私,多方计算(MPC)使得在私有数据上执行机器学习(ML)算法成为可能。然而,MPC框架缺乏针对稀疏数据的优化操作。这一缺陷使其难以适用于涉及稀疏数据的ML应用,例如推荐系统或基因组学。即使在明文状态下,此类应用也涉及高维稀疏数据,若无稀疏性相关优化,将因内存需求过大而无法处理。由于矩阵乘法是ML算法的核心基础模块,本研究提出了专用的MPC算法以实现秘密共享稀疏矩阵的乘法运算。相较于安全稠密矩阵乘法(即经典乘法),我们的稀疏算法具有多重优势:一方面,避免了稠密乘法中“稠密”数据表示方式导致的内存问题;另一方面,对于实际规模的问题,我们的算法能显著降低通信成本(最高可达$\times1000$)。我们在两个稠密矩阵乘法不可行的机器学习应用中验证了所提算法的有效性。最后,基于现实世界稀疏数据的特性,我们构建了三种技术以最小化保护稀疏算法所需的公开知识。