Samplable Anonymous Aggregation for Private Federated Data Analysis

Kunal Talwar,Shan Wang,Audra McMillan,Vojta Jina,Vitaly Feldman,Pansy Bansal,Bailey Basile,Aine Cahill,Yi Sheng Chan,Mike Chatzidakis,Junye Chen,Oliver Chick,Mona Chitnis,Suman Ganta,Yusuf Goren,Filip Granqvist,Kristine Guo,Frederic Jacobs,Omid Javidbakht,Albert Liu,Richard Low,Dan Mascenik,Steve Myers,David Park,Wonhee Park,Gianni Parsa,Tommy Pauly,Christian Priebe,Rehan Rishi,Guy Rothblum,Michael Scaria,Linmao Song,Congzheng Song,Karl Tarbe,Sebastian Vogt,Luke Winstrom,Shundong Zhou

from arxiv, 34 pages

We revisit the problem of designing scalable protocols for private statistics and private federated learning when each device holds its private data. Locally differentially private algorithms require little trust but are (provably) limited in their utility. Centrally differentially private algorithms can allow significantly better utility but require a trusted curator. This gap has led to significant interest in the design and implementation of simple cryptographic primitives, that can allow central-like utility guarantees without having to trust a central server. Our first contribution is to propose a new primitive that allows for efficient implementation of several commonly used algorithms, and allows for privacy accounting that is close to that in the central setting without requiring the strong trust assumptions it entails. {\em Shuffling} and {\em aggregation} primitives that have been proposed in earlier works enable this for some algorithms, but have significant limitations as primitives. We propose a {\em Samplable Anonymous Aggregation} primitive, which computes an aggregate over a random subset of the inputs and show that it leads to better privacy-utility trade-offs for various fundamental tasks. Secondly, we propose a system architecture that implements this primitive and perform a security analysis of the proposed system. Our design combines additive secret-sharing with anonymization and authentication infrastructures.

翻译：本文重新审视了在设备持有私有数据场景下，设计可扩展的私有统计与私有联邦学习协议的问题。本地差分隐私算法对信任要求较低，但其效用（可证明地）存在局限；中心化差分隐私算法可实现显著更优的效用，但需要可信的数据管理者。这种差距引发了学界对设计简易密码学原语的广泛兴趣，以期在不依赖中心服务器信任的前提下实现类中心化的效用保证。我们的首要贡献是提出一种新型原语，该原语能高效实现多种常用算法，并在无需强信任假设的前提下提供接近中心化场景的隐私计算能力。早期研究提出的{\em 混洗}与{\em 聚合}原语虽能为部分算法实现该目标，但作为原语存在显著局限。我们提出的{\em 可采样匿名聚合}原语可计算随机输入子集的聚合结果，并证明该原语能在多种基础任务中实现更优的隐私-效用权衡。其次，我们提出了实现该原语的系统架构，并对所提系统进行安全性分析。该设计融合了加法秘密共享、匿名化与认证基础设施。