Secure Summation via Subset Sums: A New Primitive for Privacy-Preserving Distributed Machine Learning

For population studies or for the training of complex machine learning models, it is often required to gather data from different actors. In these applications, summation is an important primitive: for computing means, counts or mini-batch gradients. In many cases, the data is privacy-sensitive and therefore cannot be collected on a central server. Hence the summation needs to be performed in a distributed and privacy-preserving way. Existing solutions for distributed summation with computational privacy guarantees make trust or connection assumptions - e.g., the existence of a trusted server or peer-to-peer connections between clients - that might not be fulfilled in real world settings. Motivated by these challenges, we propose Secure Summation via Subset Sums (S5), a method for distributed summation that works in the presence of a malicious server and only two honest clients, and without the need for peer-to-peer connections between clients. S5 adds zero-sum noise to clients' messages and shuffles them before sending them to the aggregating server. Our main contribution is a proof that this scheme yields a computational privacy guarantee based on the multidimensional subset sum problem. Our analysis of this problem may be of independent interest for other privacy and cryptography applications.

翻译：在群体研究或复杂机器学习模型训练中，常需汇聚不同参与方的数据。求和运算是这些应用中的重要原语：用于计算均值、计数或小批量梯度。在许多场景中，数据具有隐私敏感性，因此无法在中央服务器上收集，故需以分布式且隐私保护的方式执行求和运算。现有具备计算隐私保障的分布式求和解决方案依赖于信任或连接假设——例如可信服务器的存在或客户端之间的点对点连接——这些假设在现实场景中可能无法满足。受这些挑战驱动，我们提出基于子集和的安全求和（S5）方法，这是一种可在恶意服务器且仅有两个诚实客户端存在的情况下工作的分布式求和方法，且无需客户端间点对点连接。S5向客户端消息添加零和噪声并在发送至聚合服务器前对其进行混洗。我们的主要贡献在于证明了该方案基于多维子集和问题可实现计算隐私保障，且对该问题的分析可能对其它隐私与密码学应用具有独立参考价值。

相关内容

Machine Learning

关注 2251

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日