Kernel methods are widely used in causal inference for tasks such as treatment effect estimation, policy evaluation, and policy learning. The bootstrap is a standard tool for uncertainty quantification because of its broad applicability. As increasingly large datasets become available, such as the 2023 U.S. Natality data from the National Vital Statistics System (NVSS), which includes 3,596,017 registered births, the computational demands of these methods increase substantially. Kernel methods are known to scale poorly with sample size, and this limitation is further exacerbated by the repeated re-fitting required by the bootstrap. As a result, bootstrap-based inference for kernel-based estimators can become computationally infeasible in large-scale settings. In this paper, we address these challenges by extending the causal Bag of Little Bootstraps (cBLB) algorithm to kernel methods. Our approach achieves computational scalability by combining subsampling and resampling while preserving first-order uncertainty quantification and asymptotically correct coverage. We evaluate the method across three representative implementations: kernelized augmented outcome-weighted learning, kernel-based minimax weighting, and double machine learning with kernel support vector machines. We show in simulations that our method yields confidence intervals with nominal coverage at a fraction of the computational cost. We further demonstrate its utility in a real-world application by estimating the effect of any amount of smoking on birth weight, as well as the optimal treatment regime, using the NVSS dataset, where the standard bootstrap is prohibitively expensive computationally and effectively infeasible at this scale.
翻译:核方法在因果推断中被广泛用于处理效应估计、政策评估和政策学习等任务。自助法因其广泛的适用性而成为不确定性量化的标准工具。随着可用数据集规模日益增大,例如来自国家生命统计系统(NVSS)的2023年美国出生数据(包含3,596,017例登记出生记录),这些方法的计算需求显著增加。已知核方法随样本量的扩展性较差,而自助法所需的重复拟合进一步加剧了这一局限。因此,在大规模场景下,基于核的估计器采用自助法进行推断可能在计算上不可行。本文通过将因果小自助法聚合(cBLB)算法扩展至核方法来解决这些挑战。我们的方法通过结合子抽样与重抽样实现计算可扩展性,同时保持一阶不确定性量化与渐近正确的覆盖概率。我们在三种代表性实现中评估该方法:核化增强结果加权学习、基于核的极小极大加权法,以及采用核支持向量机的双重机器学习。仿真实验表明,我们的方法能以远低于传统计算成本获得满足名义覆盖概率的置信区间。我们进一步通过实际应用验证其效用:使用NVSS数据集估计任意吸烟量对新生儿体重的影响以及最优处理策略。在该规模下,标准自助法的计算成本过高,实际上不可行。