We study inference on a low-dimensional functional $β$ in the presence of infinite-dimensional nuisance parameters. Classical inferential methods are typically based on Wald intervals, whose large-sample validity rests on asymptotic negligibility of nuisance error; for example, influence-curve based estimators (Double/Debiased Machine Learning, DML) are asymptotically Gaussian when nuisance estimators converge faster than $n^{-1/4}$. Although such negligibility can hold even in nonparametric classes, it can be restrictive. To relax this requirement, we propose Perturbed Double Machine Learning, which ensures valid inference even when nuisance estimators converge slower than $n^{-1/4}$. Our proposal is to (i) inject randomness into the nuisance estimation step to generate perturbed nuisance models, each yielding an estimate of $β$ and a Wald interval, and (ii) filter out perturbations whose deviations from the original DML estimate exceed a threshold. For Lasso nuisance learners, we show that, with high probability, at least one perturbation yields nuisance estimates sufficiently close to the truth, so the associated estimator of $β$ is close to an oracle with known nuisances. The union of retained intervals delivers valid coverage even when the DML estimator converges slower than $n^{-1/2}$. The framework extends to general machine-learning nuisance learners, and simulations show coverage when state-of-the-art methods fail.
翻译:我们研究在存在无限维干扰参数时对低维泛函$β$的推断问题。经典推断方法通常基于Wald区间,其大样本有效性依赖于干扰误差的渐近可忽略性;例如,基于影响曲线的估计器(双重/去偏机器学习,DML)在干扰估计器收敛速度快于$n^{-1/4}$时具有渐近正态性。尽管这种可忽略性在非参数类中也可能成立,但其条件可能具有限制性。为放宽这一要求,我们提出扰动双重机器学习方法,确保即使在干扰估计器收敛速度慢于$n^{-1/4}$时仍能进行有效推断。我们的方案是:(i)在干扰估计步骤中注入随机性以生成扰动干扰模型,每个模型产生$β$的估计值和一个Wald区间;(ii)过滤掉与原始DML估计值偏差超过阈值的扰动。对于采用Lasso干扰学习器的情况,我们证明以高概率至少存在一个扰动能产生足够接近真实值的干扰估计,因此对应的$β$估计器接近已知干扰参数的理论最优估计器。保留区间的并集即使在DML估计器收敛速度慢于$n^{-1/2}$时仍能提供有效覆盖。该框架可扩展至通用机器学习干扰学习器,仿真实验表明其在当前最先进方法失效时仍能保持覆盖性能。