Differentially Private Selection from Secure Distributed Computing

Given a collection of vectors $x^{(1)},\dots,x^{(n)} \in \{0,1\}^d$, the selection problem asks to report the index of an "approximately largest" entry in $x=\sum_{j=1}^n x^{(j)}$. Selection abstracts a host of problems--in machine learning it can be used for hyperparameter tuning, feature selection, or to model empirical risk minimization. We study selection under differential privacy, where a released index guarantees privacy for each vectors. Though selection can be solved with an excellent utility guarantee in the central model of differential privacy, the distributed setting lacks solutions. Specifically, strong privacy guarantees with high utility are offered in high trust settings, but not in low trust settings. For example, in the popular shuffle model of distributed differential privacy, there are strong lower bounds suggesting that the utility of the central model cannot be obtained. In this paper we design a protocol for differentially private selection in a trust setting similar to the shuffle model--with the crucial difference that our protocol tolerates corrupted servers while maintaining privacy. Our protocol uses techniques from secure multi-party computation (MPC) to implement a protocol that: (i) has utility on par with the best mechanisms in the central model, (ii) scales to large, distributed collections of high-dimensional vectors, and (iii) uses $k\geq 3$ servers that collaborate to compute the result, where the differential privacy holds assuming an honest majority. Since general-purpose MPC techniques are not sufficiently scalable, we propose a novel application of integer secret sharing, and evaluate the utility and efficiency of our protocol theoretically and empirically. Our protocol is the first to demonstrate that large-scale differentially private selection is possible in a distributed setting.

翻译：给定一组向量 $x^{(1)},\dots,x^{(n)} \in \{0,1\}^d$，选择问题要求报告 $x=\sum_{j=1}^n x^{(j)}$ 中“近似最大”条目的索引。选择问题抽象了众多问题——在机器学习中，它可用于超参数调优、特征选择或对经验风险最小化进行建模。我们在差分隐私约束下研究选择问题，其中公开的索引需保证每个向量的隐私。尽管选择问题在中心化差分隐私模型中能以极佳的效用保证得到解决，但分布式场景仍缺乏解决方案。具体而言，高信任度设置下可提供强隐私保证与高效用，而低信任度设置则无法实现。例如，在流行的分布式差分隐私混洗模型中，存在强下界表明无法获得中心化模型的效用。本文设计了一种信任设置类似混洗模型的差分隐私选择协议——关键区别在于我们的协议能容忍服务器被破坏同时保持隐私性。该协议利用安全多方计算（MPC）技术实现以下特性：(i) 效用与中心化模型的最优机制相当，(ii) 可扩展至大规模分布式高维向量集合，(iii) 使用 $k\geq 3$ 台协作计算结果的服务器，在诚实多数假设下保证差分隐私。由于通用MPC技术可扩展性不足，我们提出一种整数秘密共享的新型应用方法，并从理论与实验角度评估了协议的效用与效率。本协议首次证明在分布式场景中实现大规模差分隐私选择具有可行性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【2023新书】使用Python进行统计和数据可视化，554页pdf

专知会员服务

130+阅读 · 2023年1月29日

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【北京大学】Locally Differentially Private (Contextual) Bandits Learning

专知会员服务

13+阅读 · 2020年6月8日