For population studies or for the training of complex machine learning models, it is often required to gather data from different actors. In these applications, summation is an important primitive: for computing means, counts or mini-batch gradients. In many cases, the data is privacy-sensitive and therefore cannot be collected on a central server. Hence the summation needs to be performed in a distributed and privacy-preserving way. Existing solutions for distributed summation with computational privacy guarantees make trust or connection assumptions - e.g., the existence of a trusted server or peer-to-peer connections between clients - that might not be fulfilled in real world settings. Motivated by these challenges, we propose Secure Summation via Subset Sums (S5), a method for distributed summation that works in the presence of a malicious server and only two honest clients, and without the need for peer-to-peer connections between clients. S5 adds zero-sum noise to clients' messages and shuffles them before sending them to the aggregating server. Our main contribution is a proof that this scheme yields a computational privacy guarantee based on the multidimensional subset sum problem. Our analysis of this problem may be of independent interest for other privacy and cryptography applications.
翻译:在群体研究或复杂机器学习模型训练中,常需汇聚不同参与方的数据。求和运算是这些应用中的重要原语:用于计算均值、计数或小批量梯度。在许多场景中,数据具有隐私敏感性,因此无法在中央服务器上收集,故需以分布式且隐私保护的方式执行求和运算。现有具备计算隐私保障的分布式求和解决方案依赖于信任或连接假设——例如可信服务器的存在或客户端之间的点对点连接——这些假设在现实场景中可能无法满足。受这些挑战驱动,我们提出基于子集和的安全求和(S5)方法,这是一种可在恶意服务器且仅有两个诚实客户端存在的情况下工作的分布式求和方法,且无需客户端间点对点连接。S5向客户端消息添加零和噪声并在发送至聚合服务器前对其进行混洗。我们的主要贡献在于证明了该方案基于多维子集和问题可实现计算隐私保障,且对该问题的分析可能对其它隐私与密码学应用具有独立参考价值。