Decision trees are widely used for non-linear modeling, as they capture interactions between predictors while producing inherently interpretable models. Despite their popularity, performing inference on the non-linear fit remains largely unaddressed. This paper focuses on classification trees and makes two key contributions. First, we introduce a novel tree-fitting method that replaces the greedy splitting of the predictor space in standard tree algorithms with a probabilistic approach. Each split in our approach is selected according to sampling probabilities defined by an exponential mechanism, with a temperature parameter controlling its deviation from the deterministic choice given data. Second, while our approach can fit a tree that with high probability coincides with the fit produced by standard tree algorithms at low temperatures, it is not merely predictive; unlike standard algorithms, it enables inference by taking into account the highly adaptive tree structure. Our method produces pivots directly from the sampling probabilities in the exponential mechanism. In theory, our pivots allow asymptotically valid inference on the parameters in the predictive fit, and in practice, our method delivers powerful inference without sacrificing predictive accuracy, in contrast to data splitting methods.
翻译:暂无翻译