The ability to interpret machine learning models has become increasingly important as their usage in data science continues to rise. Most current interpretability methods are optimized to work on either (\textit{i}) a global scale, where the goal is to rank features based on their contributions to overall variation in an observed population, or (\textit{ii}) the local level, which aims to detail on how important a feature is to a particular individual in the dataset. In this work, we present the ``GlObal And Local Score'' (GOALS) operator: a simple \textit{post hoc} approach to simultaneously assess local and global feature variable importance in nonlinear models. Motivated by problems in statistical genetics, we demonstrate our approach using Gaussian process regression where understanding how genetic markers affect trait architecture both among individuals and across populations is of high interest. With detailed simulations and real data analyses, we illustrate the flexible and efficient utility of GOALS over state-of-the-art variable importance strategies.
翻译:随着机器学习模型在数据科学中的应用日益普及,其可解释性能力变得愈发重要。当前大多数可解释性方法专注于以下两个尺度之一:(i) 全局尺度,旨在根据特征对观测群体整体变异贡献度进行排序;或(ii) 局部尺度,旨在详细描述特征对数据集中特定个体的重要性。本研究提出"全局与局部得分"(GOALS)算子:一种简单的\textit{事后(Post Hoc)}方法,可同时评估非线性模型中局部与全局的特征变量重要性。受统计遗传学问题启发,我们通过高斯过程回归验证该方法——在该领域中,理解遗传标记如何在个体间及群体间影响性状架构具有重要研究价值。通过详尽的模拟实验与真实数据分析,我们展示了GOALS相较于最先进变量重要性策略的灵活性与高效性。