Accumulated Local Effects (ALE) is a model-agnostic approach for global explanations of the results of black-box machine learning (ML) algorithms. There are at least three challenges with conducting statistical inference based on ALE: ensuring the reliability of ALE analyses, especially in the context of small datasets; intuitively characterizing a variable's overall effect in ML; and making robust inferences from ML data analysis. In response, we introduce innovative tools and techniques for statistical inference using ALE, establishing bootstrapped confidence intervals tailored to dataset size and introducing ALE effect size measures that intuitively indicate effects on both the outcome variable scale and a normalized scale. Furthermore, we demonstrate how to use these tools to draw reliable statistical inferences, reflecting the flexible patterns ALE adeptly highlights, with implementations available in the 'ale' package in R. This work propels the discourse on ALE and its applicability in ML and statistical analysis forward, offering practical solutions to prevailing challenges in the field.
翻译:累积局部效应(ALE)是一种模型无关的全局解释方法,用于解析黑箱机器学习算法的结果。基于ALE进行统计推断至少面临三个挑战:确保ALE分析的可信度(尤其在小型数据集语境下)、直观刻画机器学习中变量的整体效应、以及从机器学习数据分析中得出稳健推断。为此,我们引入了基于ALE的统计推断创新工具与技术:建立适应数据集规模的引导置信区间,提出既能反映结果变量尺度又能反映标准化尺度的ALE效应量指标。此外,我们通过R语言"ale"包实现相关工具,演示如何利用这些方法从ALE灵活揭示的模式中得出可靠统计推断。这项研究推动了关于ALE及其在机器学习与统计分析中适用性的讨论,为该领域现存挑战提供了切实可行的解决方案。