Given a collection of vectors $x^{(1)},\dots,x^{(n)} \in \{0,1\}^d$, the selection problem asks to report the index of an "approximately largest" entry in $x=\sum_{j=1}^n x^{(j)}$. Selection abstracts a host of problems--in machine learning it can be used for hyperparameter tuning, feature selection, or to model empirical risk minimization. We study selection under differential privacy, where a released index guarantees privacy for each vectors. Though selection can be solved with an excellent utility guarantee in the central model of differential privacy, the distributed setting lacks solutions. Specifically, strong privacy guarantees with high utility are offered in high trust settings, but not in low trust settings. For example, in the popular shuffle model of distributed differential privacy, there are strong lower bounds suggesting that the utility of the central model cannot be obtained. In this paper we design a protocol for differentially private selection in a trust setting similar to the shuffle model--with the crucial difference that our protocol tolerates corrupted servers while maintaining privacy. Our protocol uses techniques from secure multi-party computation (MPC) to implement a protocol that: (i) has utility on par with the best mechanisms in the central model, (ii) scales to large, distributed collections of high-dimensional vectors, and (iii) uses $k\geq 3$ servers that collaborate to compute the result, where the differential privacy holds assuming an honest majority. Since general-purpose MPC techniques are not sufficiently scalable, we propose a novel application of integer secret sharing, and evaluate the utility and efficiency of our protocol theoretically and empirically. Our protocol is the first to demonstrate that large-scale differentially private selection is possible in a distributed setting.
翻译:给定一组向量 $x^{(1)},\dots,x^{(n)} \in \{0,1\}^d$,选择问题要求报告 $x=\sum_{j=1}^n x^{(j)}$ 中“近似最大”条目的索引。选择问题抽象了一系列问题——在机器学习中,它可用于超参数调优、特征选择或对经验风险最小化进行建模。我们研究在差异隐私下的选择问题,其中发布的索引保证了每个向量的隐私。尽管在差异隐私的中心模型中可以以优异的效用保证解决选择问题,但分布式设置缺乏解决方案。具体来说,在高信任设置中提供了具有高效用的强隐私保证,但在低信任设置中则不然。例如,在流行的分布式差异隐私混洗模型中,存在强有力的下界表明无法获得中心模型的效用。在本文中,我们设计了一个在信任设置(类似于混洗模型)下进行差异隐私选择的协议——关键区别在于我们的协议容忍被破坏的服务器,同时保持隐私。我们的协议使用安全多方计算(MPC)技术来实现一个协议,该协议:(i) 具有与中心模型中最佳机制相当的效用,(ii) 可扩展到大规模、分布式的高维向量集合,以及 (iii) 使用 $k\geq 3$ 个协作计算结果的服务器,其中差异隐私在假设诚实多数的情况下成立。由于通用 MPC 技术不具备足够的可扩展性,我们提出了一种整数秘密共享的新颖应用,并从理论和经验上评估了我们协议的效用和效率。我们的协议是第一个证明在分布式设置中实现大规模差异隐私选择是可行的协议。