Scientists frequently prioritize learning from data rather than training the best possible model; however, research in machine learning often prioritizes the latter. Marginal contribution feature importance (MCI) was developed to break this trend by providing a useful framework for quantifying the relationships in data. In this work, we aim to improve upon the theoretical properties, performance, and runtime of MCI by introducing ultra-marginal feature importance (UMFI), which uses dependence removal techniques from the AI fairness literature as its foundation. We first propose axioms for feature importance methods that seek to explain the causal and associative relationships in data, and we prove that UMFI satisfies these axioms under basic assumptions. We then show on real and simulated data that UMFI performs better than MCI, especially in the presence of correlated interactions and unrelated features, while partially learning the structure of the causal graph and reducing the exponential runtime of MCI to super-linear.
翻译:科学家们通常更关注从数据中学习而非训练最优模型;然而机器学习研究往往更侧重于后者。边际贡献特征重要性(MCI)的提出正是为了打破这一趋势,为量化数据中的关系提供了实用框架。本研究旨在通过引入超边际特征重要性(UMFI)来改进MCI的理论性质、性能与运行效率,该方法以人工智能公平性文献中的依赖消除技术为基础。我们首先为旨在解释数据中因果与关联关系的特征重要性方法提出公理体系,并证明在基本假设下UMFI满足这些公理。随后通过真实与模拟数据表明,UMFI在存在相关交互作用及无关特征时表现优于MCI,同时能部分学习因果图结构,并将MCI的指数级运行时间降低至超线性。