The ability to interpret machine learning models has become increasingly important as their usage in data science continues to rise. Most current interpretability methods are optimized to work on either (\textit{i}) a global scale, where the goal is to rank features based on their contributions to overall variation in an observed population, or (\textit{ii}) the local level, which aims to detail on how important a feature is to a particular individual in the data set. In this work, a new operator is proposed called the "GlObal And Local Score" (GOALS): a simple \textit{post hoc} approach to simultaneously assess local and global feature variable importance in nonlinear models. Motivated by problems in biomedicine, the approach is demonstrated using Gaussian process regression where the task of understanding how genetic markers are associated with disease progression both within individuals and across populations is of high interest. Detailed simulations and real data analyses illustrate the flexible and efficient utility of GOALS over state-of-the-art variable importance strategies.
翻译:随着机器学习模型在数据科学中的应用日益广泛,其可解释性变得愈发重要。当前大多数可解释性方法要么针对全局尺度(旨在基于特征对观测总体整体变异的贡献进行排序)进行优化,要么专注于局部层面(旨在详细说明特征对数据集中特定个体的重要性)。本文提出一种名为"全局与局部评分"(GOALS)的新型算子:一种用于同时评估非线性模型中局部和全局特征变量重要性的简易\textit{事后}方法。受生物医学问题的启发,本研究以高斯过程回归为例进行方法展示,该任务重点关注理解遗传标记如何与个体内部及跨群体中的疾病进展相关联。详细的模拟实验与真实数据分析表明,相较于现有最优的变量重要性策略,GOALS具有灵活高效的优势。