Causal forests estimate how treatment effects vary across individuals, guiding personalized interventions in areas like marketing, operations, and public policy. A standard practice is honest estimation: dividing the data into two samples, one to define subgroups and another to estimate treatment effects within them. This is intended to reduce overfitting and is the default in many software packages. But is it the right choice? We show that honest estimation can reduce the accuracy of estimates of individual treatment effects, especially when effect heterogeneity is substantial and datasets are large enough to detect it. The reason is a bias-variance trade-off: honesty lowers the risk of overfitting but increases the risk of underfitting by limiting the data available to detect and model heterogeneity. Across more than 7,000 benchmark datasets, we find that the cost of using honesty by default can be as high as requiring 27% more data to match the performance of models trained without it. Honesty is best understood as a form of regularization. Whether to adopt it should depend on the goals of the application and its empirical performance, not on reflexive default use.
翻译:因果森林用于估计个体间处理效应的差异,从而指导市场营销、运营管理和公共政策等领域的个性化干预。标准做法是采用诚实估计:将数据分为两个样本,一个用于定义子群体,另一个用于估计这些子群体内的处理效应。这种方法旨在减少过拟合,并已成为许多软件包的默认选项。但这一选择是否合理?我们研究表明,诚实估计会降低个体处理效应估计的准确性,尤其是在效应异质性显著且数据集足够大以检测这种异质性时。其原因在于偏差-方差权衡:诚实方法虽然降低了过拟合风险,但通过限制检测和建模异质性可用的数据量,增加了欠拟合风险。在超过7000个基准数据集上的实验表明,默认使用诚实估计的代价最高可达需要额外27%的数据才能匹配非诚实训练模型的性能。诚实估计本质上是一种正则化形式。是否采用该方法应取决于应用目标及其经验表现,而非无意识的默认使用。