Adaptive Data Analysis in a Balanced Adversarial Model

In adaptive data analysis, a mechanism gets $n$ i.i.d. samples from an unknown distribution $D$, and is required to provide accurate estimations to a sequence of adaptively chosen statistical queries with respect to $D$. Hardt and Ullman (FOCS 2014) and Steinke and Ullman (COLT 2015) showed that in general, it is computationally hard to answer more than $\Theta(n^2)$ adaptive queries, assuming the existence of one-way functions. However, these negative results strongly rely on an adversarial model that significantly advantages the adversarial analyst over the mechanism, as the analyst, who chooses the adaptive queries, also chooses the underlying distribution $D$. This imbalance raises questions with respect to the applicability of the obtained hardness results -- an analyst who has complete knowledge of the underlying distribution $D$ would have little need, if at all, to issue statistical queries to a mechanism which only holds a finite number of samples from $D$. We consider more restricted adversaries, called \emph{balanced}, where each such adversary consists of two separated algorithms: The \emph{sampler} who is the entity that chooses the distribution and provides the samples to the mechanism, and the \emph{analyst} who chooses the adaptive queries, but has no prior knowledge of the underlying distribution (and hence has no a priori advantage with respect to the mechanism). We improve the quality of previous lower bounds by revisiting them using an efficient \emph{balanced} adversary, under standard public-key cryptography assumptions. We show that these stronger hardness assumptions are unavoidable in the sense that any computationally bounded \emph{balanced} adversary that has the structure of all known attacks, implies the existence of public-key cryptography.

翻译：在自适应数据分析中，机制从未知分布 $D$ 中获得 $n$ 个独立同分布样本，并需对一系列自适应选择的关于 $D$ 的统计查询提供精确估计。Hardt 与 Ullman（FOCS 2014）及 Steinke 与 Ullman（COLT 2015）证明：在一般情形下，若假设单向函数存在，则回答超过 $\Theta(n^2)$ 个自适应查询在计算上是困难的。然而，这些负面结果强烈依赖于一种显著有利于敌手分析者而非机制的敌手模型——选择自适应查询的分析者同时也选择底层分布 $D$。这种不平衡引发了关于所获困难结果适用性的质疑：一个完全了解底层分布 $D$ 的分析者几乎无需向仅持有 $D$ 有限样本的机制发起统计查询。我们考虑更具限制性的敌手，称为**平衡敌手**，其中每个敌手由两个分离的算法组成：**采样器**（负责选择分布并向机制提供样本的实体）与**分析者**（负责选择自适应查询，但对底层分布无先验知识，因此相对于机制不具备先验优势）。我们通过利用有效的**平衡**敌手，在标准公钥密码学假设下重新审视先前下界，提升其质量。我们证明：这些更强的困难假设是不可避免的——任何具有已知攻击结构的计算有界**平衡**敌手，均蕴含公钥密码学的存在。