Statistical inference in contextual bandits is challenging due to the adaptive, non-i.i.d. nature of the data. A growing body of work shows that classical least-squares inference can fail under adaptive sampling, and that valid confidence intervals for linear functionals typically require an inflation of order $\sqrt{d \log T}$. This phenomenon -- often termed the price of adaptivity -- reflects the intrinsic difficulty of reliable inference under general contextual bandit policies. A key structural condition that overcomes this limitation is the stability condition of Lai and Wei, which requires the empirical feature covariance to converge to a deterministic limit. When stability holds, the ordinary least-squares estimator satisfies a central limit theorem, and classical Wald-type confidence intervals remain asymptotically valid under adaptation, without incurring the $\sqrt{d \log T}$ price of adaptivity. In this paper, we propose and analyze a regularized EXP4 algorithm for linear contextual bandits. Our first main result shows that this procedure satisfies the Lai--Wei stability condition and therefore admits valid Wald-type confidence intervals for linear functionals. We additionally provide quantitative rates of convergence in the associated central limit theorem. Our second result establishes that the same algorithm achieves regret guarantees that are minimax optimal up to logarithmic factors, demonstrating that stability and statistical efficiency can coexist within a single contextual bandit method. As an application of our theory, we show how it can be used to construct confidence intervals for the conditional average treatment effect (CATE) under adaptively collected data. Finally, we complement our theory with simulations illustrating the empirical normality of the resulting estimators and the sharpness of the corresponding confidence intervals.
翻译:在上下文赌博机中进行统计推断具有挑战性,因为数据具有自适应的、非独立同分布特性。越来越多的研究表明,经典的最小二乘推断在自适应采样下可能失效,且线性泛函的有效置信区间通常需要以$\sqrt{d \log T}$量级进行膨胀。这种现象——常被称为适应性代价——反映了在一般上下文赌博机策略下进行可靠推断的内在困难。克服这一局限性的关键结构条件是Lai和Wei提出的稳定性条件,该条件要求经验特征协方差收敛于一个确定性极限。当稳定性条件成立时,普通最小二乘估计量满足中心极限定理,且经典的Wald型置信区间在自适应采样下保持渐近有效,无需承担$\sqrt{d \log T}$的适应性代价。本文针对线性上下文赌博机提出并分析了一种正则化EXP4算法。我们的第一个主要结果表明,该算法满足Lai-Wei稳定性条件,因此可为线性泛函构建有效的Wald型置信区间。我们进一步给出了相关中心极限定理的定量收敛速率。第二个结果证明,同一算法能达到对数因子范围内的极小极大最优遗憾界,这表明稳定性与统计效率可在单一上下文赌博机方法中共存。作为理论应用,我们展示了如何利用该理论为自适应收集数据下的条件平均处理效应(CATE)构建置信区间。最后,我们通过仿真实验补充理论分析,展示了所得估计量的经验正态性及相应置信区间的锐度。