Causal forests estimate how treatment effects vary across individuals, guiding personalized interventions in areas like marketing, operations, and public policy. A standard modeling practice with this method is honest estimation: dividing the data into two samples, one to define subgroups and another to estimate treatment effects within them. This is intended to reduce overfitting and is the default in many software packages. But is it the right choice? In this paper, we show that honest estimation can reduce the accuracy of individual-level treatment effect estimates, especially when there are substantial differences in how individuals respond to treatment, and the data is rich enough to uncover those differences. The core issue is a classic bias-variance trade-off: honesty lowers the risk of overfitting but increases the risk of underfitting, because it limits the data available to detect and model heterogeneity. Across 7,500 benchmark datasets, we find that the cost of using honesty by default can be as high as requiring 25% more data to match the performance of models trained without it. We argue that honesty is best understood as a form of regularization and its use should be guided by application goals and empirical evaluation, not adopted reflexively.
翻译:因果森林用于估计个体间处理效应的异质性,从而在营销、运营和公共政策等领域指导个性化干预。使用该方法时,一种标准的建模实践是诚实估计:将数据分为两个样本,一个用于定义子组,另一个用于估计这些子组内的处理效应。这种做法旨在减少过拟合,并且是许多软件包中的默认设置。但它是否总是正确的选择?本文证明,诚实估计可能会降低个体层面处理效应估计的准确性,尤其是在个体对处理的响应存在显著差异,且数据足够丰富以揭示这些差异的情况下。其核心问题是一个经典的偏差-方差权衡:诚实性降低了过拟合的风险,但增加了欠拟合的风险,因为它限制了可用于检测和建模异质性的数据。在7,500个基准数据集上的实验表明,默认使用诚实性的代价可能高达需要额外25%的数据才能达到不使用诚实性所训练模型的性能。我们认为,诚实性最好被理解为一种正则化形式,其使用应基于应用目标和实证评估来指导,而非机械地采用。