Privacy-preserving machine learning enables the training of models on decentralized datasets without the need to reveal the data, both on horizontal and vertically partitioned data. However, it relies on specialized techniques and algorithms to perform the necessary computations. The privacy preserving scalar product protocol, which enables the dot product of vectors without revealing them, is one popular example for its versatility. Unfortunately, the solutions currently proposed in the literature focus mainly on two-party scenarios, even though scenarios with a higher number of data parties are becoming more relevant. For example when performing analyses that require counting the number of samples which fulfill certain criteria defined across various sites, such as calculating the information gain at a node in a decision tree. In this paper we propose a generalization of the protocol for an arbitrary number of parties, based on an existing two-party method. Our proposed solution relies on a recursive resolution of smaller scalar products. After describing our proposed method, we discuss potential scalability issues. Finally, we describe the privacy guarantees and identify any concerns, as well as comparing the proposed method to the original solution in this aspect.
翻译:隐私保护机器学习能够在无需泄露数据的前提下,对水平与垂直划分的分散数据集进行模型训练。然而,它依赖于专门的技术和算法来完成必要计算。隐私保护标量积协议能实现向量点积且不暴露向量本身,因其多功能性而广受关注。遗憾的是,目前文献中提出的解决方案主要聚焦于两方场景,尽管多方数据参与的场景正变得日益重要。例如,执行需统计符合多站点特定条件样本数量的分析时(如计算决策树节点的信息增益),就需要多方协作。本文基于现有的两方方法,提出一种适用于任意参与方数量的协议泛化方案。我们的解决方案依赖递归分解较小规模的标量积。在阐述所提方法后,我们探讨了潜在的可扩展性问题。最后,我们描述了隐私保障机制并识别了相关风险,同时在该方面将所提方法与原始方案进行了比较。