The Rashomon Effect describes the following phenomenon: for a given dataset there may exist many models with equally good performance but with different solution strategies. The Rashomon Effect has implications for Explainable Machine Learning, especially for the comparability of explanations. We provide a unified view on three different comparison scenarios and conduct a quantitative evaluation across different datasets, models, attribution methods, and metrics. We find that hyperparameter-tuning plays a role and that metric selection matters. Our results provide empirical support for previously anecdotal evidence and exhibit challenges for both scientists and practitioners.
翻译:拉什蒙效应描述了以下现象:对于给定数据集,可能存在多个性能相当但采用不同解决策略的模型。拉什蒙效应对可解释机器学习具有重要影响,尤其是对解释结果的可比性。我们针对三种不同的比较场景提出了统一视角,并在不同数据集、模型、归因方法和评价指标上进行了定量评估。研究发现,超参数调优具有显著影响,且评价指标的选择至关重要。我们的结果为先前基于零散证据的观察提供了实证支持,同时揭示了科研人员与实践者共同面临的挑战。