Privacy-preserving machine learning enables the training of models on decentralized datasets without the need to reveal the data, both on horizontal and vertically partitioned data. However, it relies on specialized techniques and algorithms to perform the necessary computations. The privacy preserving scalar product protocol, which enables the dot product of vectors without revealing them, is one popular example for its versatility. Unfortunately, the solutions currently proposed in the literature focus mainly on two-party scenarios, even though scenarios with a higher number of data parties are becoming more relevant. For example when performing analyses that require counting the number of samples which fulfill certain criteria defined across various sites, such as calculating the information gain at a node in a decision tree. In this paper we propose a generalization of the protocol for an arbitrary number of parties, based on an existing two-party method. Our proposed solution relies on a recursive resolution of smaller scalar products. After describing our proposed method, we discuss potential scalability issues. Finally, we describe the privacy guarantees and identify any concerns, as well as comparing the proposed method to the original solution in this aspect.
翻译:隐私保护的机器学习使得在不泄露数据的前提下对水平或垂直划分的分布式数据集进行模型训练成为可能。然而,这依赖于专门的技术和算法来执行必要的计算。隐私保护的标量积协议——允许在不泄露向量本身的情况下计算其点积——因其通用性而成为典型示例。遗憾的是,现有文献中提出的解决方案主要关注两方场景,尽管涉及更多数据参与方的场景正日趋重要。例如,在执行需要统计满足跨站点特定条件的样本数量的分析时(如计算决策树节点的信息增益),多参与方场景显得尤为关键。本文基于现有的两方方法,提出了一种适用于任意数量参与方的推广协议。该方案采用递归方式分解为更小的标量积问题来解决。在描述所提方法后,我们讨论了潜在的可扩展性问题。最后,我们阐述了隐私保障机制并指出了相关风险,同时从这一角度将所提方法与原始方案进行了对比。