Confidence intervals based on the central limit theorem (CLT) are a cornerstone of classical statistics. Despite being only asymptotically valid, they are ubiquitous because they permit statistical inference under very weak assumptions, and can often be applied to problems even when nonasymptotic inference is impossible. This paper introduces time-uniform analogues of such asymptotic confidence intervals. To elaborate, our methods take the form of confidence sequences (CS) -- sequences of confidence intervals that are uniformly valid over time. CSs provide valid inference at arbitrary stopping times, incurring no penalties for "peeking" at the data, unlike classical confidence intervals which require the sample size to be fixed in advance. Existing CSs in the literature are nonasymptotic, and hence do not enjoy the aforementioned broad applicability of asymptotic confidence intervals. Our work bridges the gap by giving a definition for "asymptotic CSs", and deriving a universal asymptotic CS that requires only weak CLT-like assumptions. While the CLT approximates the distribution of a sample average by that of a Gaussian at a fixed sample size, we use strong invariance principles (stemming from the seminal 1960s work of Strassen and improvements by Koml\'os, Major, and Tusn\'ady) to uniformly approximate the entire sample average process by an implicit Gaussian process. We demonstrate their utility by deriving nonparametric asymptotic CSs for the average treatment effect based on doubly robust estimators in observational studies, for which no nonasymptotic methods can exist even in the fixed-time regime. This enables causal inference that can be continuously monitored and adaptively stopped.
翻译:基于中心极限定理(CLT)的置信区间是经典统计学的基石。尽管这些区间仅具有渐近有效性,但由于其在极弱假设下即可进行统计推断,且即便在无法进行非渐近推断的问题中也能应用,因此被广泛使用。本文引入了此类渐近置信区间的均匀时间类似物。具体而言,我们的方法采用置信序列(CS)的形式——即随时间均匀有效的置信区间序列。与需要预先固定样本量的经典置信区间不同,CS可在任意停止时间提供有效推断,且不会因"偷窥"数据而产生惩罚。文献中现有的CS属于非渐近方法,因此并不具备上述渐近置信区间的广泛适用性。我们的工作通过定义"渐近CS"并推导出仅需类似CLT弱假设的通用渐近CS,填补了这一空白。传统CLT在固定样本量下用高斯分布近似样本均值的分布,而我们利用强大的不变性原理(源自Strassen在1960年代的开创性工作及Komlós、Major与Tusnády的改进),将整个样本均值过程均匀地近似为一个隐式高斯过程。我们通过推导基于双稳健估计量的平均处理效应非参数渐近CS来展示其效用,这类估计量在固定样本量场景下甚至不存在非渐近方法。这使得因果推断能够实现持续监测与自适应停止。