Single-parameter summaries of variable effects are desirable for ease of interpretation, but linear models, which would deliver these, may fit poorly to the data. A modern approach is to estimate the average partial effect -- the average slope of the regression function with respect to the predictor of interest -- using a doubly robust semiparametric procedure. Most existing work has focused on specific forms of nuisance function estimators. We extend the scope to arbitrary plug-in nuisance function estimation, allowing for the use of modern machine learning methods which in particular may deliver non-differentiable regression function estimates. Our procedure involves resmoothing a user-chosen first-stage regression estimator to produce a differentiable version, and modelling the conditional distribution of the predictors through a location-scale model. We show that our proposals lead to a semiparametric efficient estimator under relatively weak assumptions. Our theory makes use of a new result on the sub-Gaussianity of Lipschitz score functions that may be of independent interest. We demonstrate the attractive numerical performance of our approach in a variety of settings including ones with misspecification.
翻译:变量效应的单参数摘要因易于解释而备受青睐,但能够提供这类摘要的线性模型可能无法良好拟合数据。一种现代方法是利用双稳健半参数过程估计平均偏效应——即回归函数关于感兴趣预测变量的平均斜率。现有工作大多集中于特定形式的干扰函数估计量。我们将研究范围扩展至任意插入式干扰函数估计,允许使用现代机器学习方法——这类方法尤其可能产生不可微的回归函数估计。我们的方法包括:对用户选择的第一阶段回归估计量进行重平滑以生成可微版本,并通过位置尺度模型刻画预测变量的条件分布。我们证明,在相对较弱的假设下,我们的方案可产生半参数有效估计量。理论推导中我们提出了一个关于Lipschitz得分函数次高斯性的新结论,该结论可能具有独立研究价值。我们在包括模型设定错误在内的多种场景中验证了该方法优越的数值表现。