Scientists frequently prioritize learning from data rather than training the best possible model; however, research in machine learning often prioritizes the latter. Marginal contribution feature importance (MCI) was developed to break this trend by providing a useful framework for quantifying the relationships in data. In this work, we aim to improve upon the theoretical properties, performance, and runtime of MCI by introducing ultra-marginal feature importance (UMFI), which uses dependence removal techniques from the AI fairness literature as its foundation. We first propose axioms for feature importance methods that seek to explain the causal and associative relationships in data, and we prove that UMFI satisfies these axioms under basic assumptions. We then show on real and simulated data that UMFI performs better than MCI, especially in the presence of correlated interactions and unrelated features, while partially learning the structure of the causal graph and reducing the exponential runtime of MCI to super-linear.
翻译:科学家们常常优先考虑从数据中学习,而非训练最优模型;然而,机器学习研究往往更侧重后者。边际贡献特征重要性(MCI)旨在打破这一趋势,通过提供一个量化数据关系的实用框架。本研究通过引入超边际特征重要性(UMFI)来改进MCI的理论性质、性能及运行时间,UMFI以人工智能公平性文献中的依赖性去除技术为基础。我们首先提出用于解释数据中因果与关联关系的特征重要性方法公理,并证明UMFI在基本假设下满足这些公理。随后在真实与模拟数据上展示,UMFI性能优于MCI,尤其在存在相关交互作用与无关特征时表现更佳,同时部分学习因果图结构并将MCI的指数级运行时间降低至超线性。