Real-world observational datasets and machine learning have revolutionized data-driven decision-making, yet many models rely on empirical associations that may be misleading due to confounding and subgroup heterogeneity. Simpson's paradox exemplifies this challenge, where aggregated and subgroup-level associations contradict each other, leading to misleading conclusions. Existing methods provide limited support for detecting and interpreting such paradoxical associations, especially for practitioners without deep causal expertise. We introduce De-paradox Tree, an interpretable algorithm designed to uncover hidden subgroup patterns behind paradoxical associations under assumed causal structures involving confounders and effect heterogeneity. It employs novel split criteria and balancing-based procedures to adjust for confounders and homogenize heterogeneous effects through recursive partitioning. Compared to state-of-the-art methods, De-paradox Tree builds simpler, more interpretable trees, selects relevant covariates, and identifies nested opposite effects while ensuring robust estimation of causal effects when causally admissible variables are provided. Our approach addresses the limitations of traditional causal inference and machine learning methods by introducing an interpretable framework that supports non-expert practitioners while explicitly acknowledging causal assumptions and scope limitations, enabling more reliable and informed decision-making in complex observational data environments.
翻译:现实世界中的观测数据集与机器学习技术已彻底革新了数据驱动的决策过程,然而许多模型依赖于经验性关联,这些关联可能因混杂因素和亚组异质性而产生误导。辛普森悖论正是这一挑战的典型例证:整体层面的关联与亚组层面的关联相互矛盾,从而导致误导性结论。现有方法在检测和解释此类悖论性关联方面提供的支持有限,尤其对于缺乏深度因果专业知识的实践者而言。我们提出解悖树——一种可解释算法,旨在揭示在假定包含混杂因子和效应异质性的因果结构下,悖论性关联背后隐藏的亚组模式。该算法采用新颖的分裂准则与基于平衡化的程序,通过递归划分来调整混杂因素并同质化异质效应。与现有先进方法相比,解悖树能构建更简洁、更可解释的树结构,选择相关协变量,识别嵌套的相反效应,并在提供因果可容变量的情况下确保因果效应的稳健估计。我们的方法通过引入一个可解释框架,弥补了传统因果推断与机器学习方法的局限性:该框架既支持非专家实践者使用,又明确承认因果假设与适用范围限制,从而在复杂观测数据环境中实现更可靠、更明智的决策。