Adaptive Data Analysis in a Balanced Adversarial Model

In adaptive data analysis, a mechanism gets $n$ i.i.d. samples from an unknown distribution $D$, and is required to provide accurate estimations to a sequence of adaptively chosen statistical queries with respect to $D$. Hardt and Ullman (FOCS 2014) and Steinke and Ullman (COLT 2015) showed that in general, it is computationally hard to answer more than $\Theta(n^2)$ adaptive queries, assuming the existence of one-way functions. However, these negative results strongly rely on an adversarial model that significantly advantages the adversarial analyst over the mechanism, as the analyst, who chooses the adaptive queries, also chooses the underlying distribution $D$. This imbalance raises questions with respect to the applicability of the obtained hardness results -- an analyst who has complete knowledge of the underlying distribution $D$ would have little need, if at all, to issue statistical queries to a mechanism which only holds a finite number of samples from $D$. We consider more restricted adversaries, called \emph{balanced}, where each such adversary consists of two separated algorithms: The \emph{sampler} who is the entity that chooses the distribution and provides the samples to the mechanism, and the \emph{analyst} who chooses the adaptive queries, but does not have a prior knowledge of the underlying distribution. We improve the quality of previous lower bounds by revisiting them using an efficient \emph{balanced} adversary, under standard public-key cryptography assumptions. We show that these stronger hardness assumptions are unavoidable in the sense that any computationally bounded \emph{balanced} adversary that has the structure of all known attacks, implies the existence of public-key cryptography.

翻译：在自适应数据分析中，一种机制从未知分布D中获取n个独立同分布样本，并需要针对一系列自适应选取的统计查询提供关于D的精确估计。Hardt与Ullman（FOCS 2014）以及Steinke与Ullman（COLT 2015）指出，在一般情形下，假设单向函数存在，回答超过Θ(n²)个自适应查询在计算上是困难的。然而，这些负面结果强烈依赖于一种对抗模型，该模型使对抗性分析师相对于机制具有显著优势，因为选择自适应查询的分析师同时也选取底层分布D。这种不平衡引发了关于所获困难性结果适用性的质疑——完全知晓底层分布D的分析师几乎无需向仅持有D有限样本的机制发起统计查询。我们考虑了更为受限的、称为“平衡”的对手，其中每个此类对手由两个分离的算法组成：采样器（负责选取分布并向机制提供样本）和分析师（负责选择自适应查询但对底层分布无先验知识）。我们通过利用标准公钥密码学假设下的高效平衡对手重新审视先前下界，改进了这些下界的质量。我们证明，任何具有已知攻击结构的计算有界平衡对手都必然蕴含公钥密码学的存在——换言之，这些更强的困难性假设是不可避免的。