Decision tree ensembles are widely used in critical domains, making robustness and sensitivity analysis essential to their trustworthiness. We study the feature sensitivity problem, which asks whether an ensemble is sensitive to a specified subset of features -- such as protected attributes -- whose manipulation can alter model predictions. Existing approaches often yield examples of sensitivity that lie far from the training distribution, limiting their interpretability and practical value. We propose a data-aware sensitivity framework that constrains the sensitive examples to remain close to the dataset, thereby producing realistic and interpretable evidence of model weaknesses. To this end, we develop novel techniques for data-aware search using a combination of mixed-integer linear programming (MILP) and satisfiability modulo theories (SMT) encodings. Our contributions are fourfold. First, we strengthen the NP-hardness result for sensitivity verification, showing it holds even for trees of depth 1. Second, we develop MILP-optimizations that significantly speed up sensitivity verification for single ensembles and for the first time can also handle multiclass tree ensembles. Third, we introduce a data-aware framework generating realistic examples close to the training distribution. Finally, we conduct an extensive experimental evaluation on large tree ensembles, demonstrating scalability to ensembles with up to 800 trees of depth 8, achieving substantial improvements over the state of the art. This framework provides a practical foundation for analyzing the reliability and fairness of tree-based models in high-stakes applications.
翻译:决策树集成模型在关键领域广泛应用,其鲁棒性与敏感性分析对模型可信度至关重要。本研究聚焦特征敏感性问题,即探究集成模型是否对特定特征子集(如受保护属性)具有敏感性——这些特征的篡改可能导致模型预测结果改变。现有方法常产生远离训练分布的敏感性示例,限制了其可解释性与实用价值。我们提出一种数据感知敏感性分析框架,通过约束敏感示例保持接近数据集分布,从而生成真实且可解释的模型缺陷证据。为此,我们开发了结合混合整数线性规划(MILP)与可满足性模理论(SMT)编码的新型数据感知搜索技术。本研究的贡献包含四个方面:首先,我们强化了敏感性验证的NP难性证明,揭示该性质即使对深度为1的树依然成立;其次,我们开发了MILP优化技术,显著加速单集成模型的敏感性验证,并首次实现多分类树集成模型的处理能力;第三,我们提出生成接近训练分布的现实示例的数据感知框架;最后,我们在大规模树集成上开展广泛实验评估,证明该框架可扩展至包含800棵深度为8的树集成,相较现有技术实现显著提升。该框架为高风险应用中基于树的模型可靠性与公平性分析奠定了实践基础。