The growing demand for personalized decision-making has led to a surge of interest in estimating the Conditional Average Treatment Effect (CATE). The intersection of machine learning and causal inference has yielded various effective CATE estimators. However, deploying these estimators in practice is often hindered by the absence of counterfactual labels, making it challenging to select the desirable CATE estimator using conventional model selection procedures like cross-validation. Existing approaches for CATE estimator selection, such as plug-in and pseudo-outcome metrics, face two inherent challenges. Firstly, they are required to determine the metric form and the underlying machine learning models for fitting nuisance parameters or plug-in learners. Secondly, they lack a specific focus on selecting a robust estimator. To address these challenges, this paper introduces a novel approach, the Distributionally Robust Metric (DRM), for CATE estimator selection. The proposed DRM not only eliminates the need to fit additional models but also excels at selecting a robust CATE estimator. Experimental studies demonstrate the efficacy of the DRM method, showcasing its consistent effectiveness in identifying superior estimators while mitigating the risk of selecting inferior ones.
翻译:个性化决策需求的日益增长推动了对条件平均处理效应(CATE)估计的关注。机器学习与因果推断的交叉研究已催生出多种有效的CATE估计器。然而,由于反事实标签的缺失,这些估计器在实际部署中常面临困难——传统的模型选择流程(如交叉验证)难以直接用于优选CATE估计器。现有CATE估计器选择方法(如插件法和伪结果指标)存在两个固有挑战:首先,它们需要确定指标形式及用于拟合 nuisance 参数或插件学习器的底层机器学习模型;其次,它们未能重点聚焦于鲁棒估计器的选择。针对这些问题,本文提出了一种新型方法——分布鲁棒指标(DRM),用于CATE估计器选择。所提出的DRM不仅无需拟合额外模型,更在选取鲁棒CATE估计器方面表现突出。实验研究验证了DRM方法的有效性,证明其在持续识别优异估计器的同时,能够有效降低选择劣质估计器的风险。