Given a collection of vectors $x^{(1)},\dots,x^{(n)} \in \{0,1\}^d$, the selection problem asks to report the index of an "approximately largest" entry in $x=\sum_{j=1}^n x^{(j)}$. Selection abstracts a host of problems--in machine learning it can be used for hyperparameter tuning, feature selection, or to model empirical risk minimization. We study selection under differential privacy, where a released index guarantees privacy for each vectors. Though selection can be solved with an excellent utility guarantee in the central model of differential privacy, the distributed setting lacks solutions. Specifically, strong privacy guarantees with high utility are offered in high trust settings, but not in low trust settings. For example, in the popular shuffle model of distributed differential privacy, there are strong lower bounds suggesting that the utility of the central model cannot be obtained. In this paper we design a protocol for differentially private selection in a trust setting similar to the shuffle model--with the crucial difference that our protocol tolerates corrupted servers while maintaining privacy. Our protocol uses techniques from secure multi-party computation (MPC) to implement a protocol that: (i) has utility on par with the best mechanisms in the central model, (ii) scales to large, distributed collections of high-dimensional vectors, and (iii) uses $k\geq 3$ servers that collaborate to compute the result, where the differential privacy holds assuming an honest majority. Since general-purpose MPC techniques are not sufficiently scalable, we propose a novel application of integer secret sharing, and evaluate the utility and efficiency of our protocol theoretically and empirically. Our protocol is the first to demonstrate that large-scale differentially private selection is possible in a distributed setting.
翻译:给定一组向量 $x^{(1)},\dots,x^{(n)} \in \{0,1\}^d$,选择问题要求报告 $x=\sum_{j=1}^n x^{(j)}$ 中“近似最大”条目的索引。选择问题抽象了众多问题——在机器学习中,它可用于超参数调优、特征选择或对经验风险最小化进行建模。我们在差分隐私约束下研究选择问题,其中公开的索引需保证每个向量的隐私。尽管选择问题在中心化差分隐私模型中能以极佳的效用保证得到解决,但分布式场景仍缺乏解决方案。具体而言,高信任度设置下可提供强隐私保证与高效用,而低信任度设置则无法实现。例如,在流行的分布式差分隐私混洗模型中,存在强下界表明无法获得中心化模型的效用。本文设计了一种信任设置类似混洗模型的差分隐私选择协议——关键区别在于我们的协议能容忍服务器被破坏同时保持隐私性。该协议利用安全多方计算(MPC)技术实现以下特性:(i) 效用与中心化模型的最优机制相当,(ii) 可扩展至大规模分布式高维向量集合,(iii) 使用 $k\geq 3$ 台协作计算结果的服务器,在诚实多数假设下保证差分隐私。由于通用MPC技术可扩展性不足,我们提出一种整数秘密共享的新型应用方法,并从理论与实验角度评估了协议的效用与效率。本协议首次证明在分布式场景中实现大规模差分隐私选择具有可行性。