We provide tools for sharing sensitive data when the data curator doesn't know in advance what questions an (untrusted) analyst might ask about the data. The analyst can specify a program that they want the curator to run on the dataset. We model the program as a black-box function $f$. We study differentially private algorithms, called privacy wrappers, that, given black-box access to a real-valued function $f$ and a sensitive dataset $x$, output an accurate approximation to $f(x)$. The dataset $x$ is modeled as a finite subset of a possibly infinite set $U$, in which each entry represents data of one individual. A privacy wrapper calls $f$ on the dataset $x$ and on some subsets of $x$ and returns either an approximation to $f(x)$ or a nonresponse symbol $\perp$. The wrapper may also use additional information (that is, parameters) provided by the analyst, but differential privacy is required for all values of these parameters. Correct setting of these parameters will ensure better accuracy of the wrapper. The bottleneck in the running time of our wrappers is the number of calls to $f$, which we refer to as queries. Our goal is to design wrappers with high accuracy and low query complexity. We introduce a novel setting, the automated sensitivity detection setting, where the analyst supplies the black-box function $f$ and the intended (finite) range of $f$. In the previously considered setting, the claimed sensitivity bound setting, the analyst supplies additional parameters that describe the sensitivity of $f$. We design privacy wrappers for both settings and show that our wrappers are nearly optimal in terms of accuracy, locality (i.e., the depth of the local neighborhood of the dataset $x$ they explore), and query complexity. In the claimed sensitivity bound setting, we provide the first accuracy guarantees that have no dependence on the size of the universe $U$.
翻译:我们提供了一种在数据管理者无法预先知晓(不可信的)分析者可能对数据提出何种问题的情况下共享敏感数据的工具。分析者可指定一个希望管理者在数据集上运行的程序。我们将该程序建模为黑盒函数$f$。我们研究称为隐私封装器的差分隐私算法,该算法在给定对实值函数$f$的黑盒访问权限和敏感数据集$x$的情况下,能够输出对$f(x)$的精确近似。数据集$x$被建模为可能无限集合$U$的有限子集,其中每个条目代表一个个体的数据。隐私封装器在数据集$x$及其若干子集上调用$f$,并返回对$f(x)$的近似值或非响应符号$\perp$。封装器也可使用分析者提供的额外信息(即参数),但差分隐私性需对这些参数的所有取值均成立。这些参数的正确设置将确保封装器获得更好的准确性。我们封装器运行时间的瓶颈在于调用$f$的次数,我们称之为查询次数。我们的目标是设计具有高准确性和低查询复杂度的封装器。我们引入了一种新颖的设置——自动敏感度检测设置,其中分析者提供黑盒函数$f$及其预期(有限)值域。在先前考虑的设置(即声称敏感度界限设置)中,分析者需提供描述$f$敏感度的额外参数。我们为两种设置设计了隐私封装器,并证明我们的封装器在准确性、局部性(即其探索数据集$x$的局部邻域深度)和查询复杂度方面近乎最优。在声称敏感度界限设置中,我们首次提供了不依赖于全域$U$大小的准确性保证。