Estimation and inference in statistics pose significant challenges when data are collected adaptively. Even in linear models, the Ordinary Least Squares (OLS) estimator may fail to exhibit asymptotic normality for single coordinate estimation and have inflated error. This issue is highlighted by a recent minimax lower bound, which shows that the error of estimating a single coordinate can be enlarged by a multiple of $\sqrt{d}$ when data are allowed to be arbitrarily adaptive, compared with the case when they are i.i.d. Our work explores this striking difference in estimation performance between utilizing i.i.d. and adaptive data. We investigate how the degree of adaptivity in data collection impacts the performance of estimating a low-dimensional parameter component in high-dimensional linear models. We identify conditions on the data collection mechanism under which the estimation error for a low-dimensional parameter component matches its counterpart in the i.i.d. setting, up to a factor that depends on the degree of adaptivity. We show that OLS or OLS on centered data can achieve this matching error. In addition, we propose a novel estimator for single coordinate inference via solving a Two-stage Adaptive Linear Estimating equation (TALE). Under a weaker form of adaptivity in data collection, we establish an asymptotic normality property of the proposed estimator.
翻译:统计学中的估计与推断在数据自适应收集时面临重大挑战。即使在线性模型中,普通最小二乘估计量对单坐标估计可能无法呈现渐近正态性,且误差会增大。近期极小极大下界研究凸显此问题:与独立同分布数据相比,当数据允许任意自适应时,单坐标估计误差可被放大$\sqrt{d}$倍。本文探讨利用独立同分布数据与自适应数据进行估计的性能差异,研究数据收集的自适应程度如何影响高维线性模型中低维参数分量的估计性能。我们识别出数据收集机制的条件,在此条件下,低维参数分量的估计误差与其在独立同分布场景中的误差相匹配(相差一个取决于自适应程度的因子)。研究表明,普通最小二乘法或基于中心化数据的普通最小二乘法可实现这种匹配误差。此外,我们提出一种通过求解两阶段自适应线性估计方程的新型单坐标推断估计量。在数据收集具有较弱形式自适应性的条件下,我们建立了该估计量的渐近正态性质。