Explanations in machine learning are critical for trust, transparency, and fairness. Yet, complex disagreements among these explanations limit the reliability and applicability of machine learning models, especially in high-stakes environments. We formalize four fundamental ranking-based explanation disagreement problems and introduce a novel framework, EXplanation AGREEment (EXAGREE), to bridge diverse interpretations in explainable machine learning, particularly from stakeholder-centered perspectives. Our approach leverages a Rashomon set for attribution predictions and then optimizes within this set to identify Stakeholder-Aligned Explanation Models (SAEMs) that minimize disagreement with diverse stakeholder needs while maintaining predictive performance. Rigorous empirical analysis on synthetic and real-world datasets demonstrates that EXAGREE reduces explanation disagreement and improves fairness across subgroups in various domains. EXAGREE not only provides researchers with a new direction for studying explanation disagreement problems but also offers data scientists a tool for making better-informed decisions in practical applications.
翻译:机器学习中的解释对于信任、透明度和公平性至关重要。然而,这些解释之间存在的复杂分歧限制了机器学习模型的可靠性和适用性,尤其是在高风险环境中。我们形式化了四个基于排序的基本解释分歧问题,并引入了一个新颖的框架——解释一致性(EXAGREE),以弥合可解释机器学习中,特别是从利益相关者中心视角出发的多样化解释。我们的方法利用一个Rashomon集合进行归因预测,然后在该集合内进行优化,以识别出利益相关者对齐解释模型(SAEMs)。这些模型能在保持预测性能的同时,最大限度地减少与不同利益相关者需求之间的分歧。在合成和真实世界数据集上的严格实证分析表明,EXAGREE减少了不同领域的解释分歧,并改善了各子群体间的公平性。EXAGREE不仅为研究者提供了研究解释分歧问题的新方向,也为数据科学家在实际应用中做出更明智的决策提供了工具。