Recursive decision trees are widely used to estimate heterogeneous causal treatment effects in experimental and observational studies. These methods are typically implemented using CART-type recursive partitioning, with splitting criteria designed to identify variation in treatment effects across covariate-defined subgroups. We study causal tree estimators based on adaptive recursive partitioning and establish lower bounds on their estimation accuracy. The class we analyze includes versions with and without sample splitting, based on common treatment effect and squared-error splitting criteria. Even in a constant-effect benchmark with randomized treatment assignment, causal trees constructed via standard CART-type splitting rules can have uniform-norm errors that decrease more slowly than any power of the sample size. The underlying mechanism is that greedy recursive partitioning selects highly imbalanced splits with nonvanishing probability, producing terminal nodes containing very few observations and leading to large estimation variance. We further show that sample splitting, often called ``honesty,'' does not remove this limitation. As a consequence, causal tree estimators may converge arbitrarily slowly uniformly over the covariate space. At the same time, these estimators can have small integrated mean squared error, showing that average accuracy can mask local inaccuracy. Our results also clarify the role of balanced partition assumptions in existing theoretical guarantees for causal forests and related ensemble methods.
翻译:递归决策树广泛用于实验和观察研究中估计异质性因果处理效应。此类方法通常采用CART型递归分割实现,其分裂准则旨在识别不同协变量定义子组间的处理效应变异。我们研究基于自适应递归分割的因果树估计量,并建立其估计精度的下界。所分析的类别包括基于常见处理效应和平方误差分裂准则、含或不含样本分割的变体。即使在随机化处理分配的常数效应基准情景下,通过标准CART型分裂规则构建的因果树,其一致范数误差的衰减速度可能慢于样本量的任意幂次。其潜在机理在于:贪婪递归分割以非消失概率选择高度不平衡的分裂,产生包含极少观测值的终端节点,导致估计方差增大。我们进一步证明,通常称为"诚实性"的样本分割无法消除此局限。因此,因果树估计量可能在协变量空间上以任意缓慢速度一致收敛。同时,这些估计量可具有较小的积分均方误差,表明平均精度可能掩盖局部不精确性。我们的结果还阐明了平衡分割假设在因果森林及相关集成方法现有理论保证中的作用。