The study of adaptive data analysis examines how many statistical queries can be answered accurately using a fixed dataset while avoiding false discoveries (statistically inaccurate answers). In this paper, we tackle a question that precedes the field of study: Is data only valuable when it provides accurate answers to statistical queries? To answer this question, we use Stochastic Convex Optimization as a case study. In this model, algorithms are considered as analysts who query an estimate of the gradient of a noisy function at each iteration and move towards its minimizer. It is known that $O(1/\epsilon^2)$ examples can be used to minimize the objective function, but none of the existing methods depend on the accuracy of the estimated gradients along the trajectory. Therefore, we ask: How many samples are needed to minimize a noisy convex function if we require $\epsilon$-accurate estimates of $O(1/\epsilon^2)$ gradients? Or, might it be that inaccurate gradient estimates are \emph{necessary} for finding the minimum of a stochastic convex function at an optimal statistical rate? We provide two partial answers to this question. First, we show that a general analyst (queries that may be maliciously chosen) requires $\Omega(1/\epsilon^3)$ samples, ruling out the possibility of a foolproof mechanism. Second, we show that, under certain assumptions on the oracle, $\tilde \Omega(1/\epsilon^{2.5})$ samples are necessary for gradient descent to interact with the oracle. Our results are in contrast to classical bounds that show that $O(1/\epsilon^2)$ samples can optimize the population risk to an accuracy of $O(\epsilon)$, but with spurious gradients.
翻译:自适应数据分析研究探讨了如何在固定数据集上准确回答大量统计查询,同时避免虚假发现(统计上不准确的答案)。在本文中,我们探讨了这一研究领域之前的一个问题:数据只有在能提供统计查询的准确答案时才有价值吗?为了回答这个问题,我们以随机凸优化为例进行案例研究。在该模型中,算法被视为分析员,每次迭代时查询噪声函数梯度的估计值,并向其最小化方向移动。已知$O(1/\epsilon^2)$个样本可用于最小化目标函数,但现有方法均不依赖于轨迹上估计梯度的准确性。因此,我们提出疑问:如果要求对$O(1/\epsilon^2)$个梯度进行$\epsilon$精度估计,需要多少样本才能最小化噪声凸函数?或者说,不准确的梯度估计是否可能是以最优统计速率找到随机凸函数最小值的必要条件?我们对此问题提供了两个部分答案。首先,我们证明,对于一般分析员(查询可能被恶意选择),需要$\Omega(1/\epsilon^3)$个样本,从而排除了存在完美防御机制的可能性。其次,我们证明,在关于预言机的特定假设下,梯度下降与预言机交互需要$\tilde \Omega(1/\epsilon^{2.5})$个样本。我们的结果与经典界限形成对比:经典界限表明$O(1/\epsilon^2)$个样本可将总体风险优化至$O(\epsilon)$精度,但梯度是虚假的。