Accumulated Local Effects (ALE) is a model-agnostic approach for global explanations of the results of black-box machine learning (ML) algorithms. There are at least three challenges with conducting statistical inference based on ALE: ensuring the reliability of ALE analyses, especially in the context of small datasets; intuitively characterizing a variable's overall effect in ML; and making robust inferences from ML data analysis. In response, we introduce innovative tools and techniques for statistical inference using ALE, establishing bootstrapped confidence intervals tailored to dataset size and introducing ALE effect size measures that intuitively indicate effects on both the outcome variable scale and a normalized scale. Furthermore, we demonstrate how to use these tools to draw reliable statistical inferences, reflecting the flexible patterns ALE adeptly highlights, with implementations available in the 'ale' package in R. This work propels the discourse on ALE and its applicability in ML and statistical analysis forward, offering practical solutions to prevailing challenges in the field.
翻译:积累局部效应(ALE)是一种用于黑箱机器学习(ML)算法结果全局解释的模型无关方法。基于ALE进行统计推断面临至少三个挑战:确保ALE分析的可靠性(尤其在小型数据集场景中);直观刻画ML中变量的总体效应;以及从ML数据分析中得出稳健推断。为此,我们引入创新的ALE统计推断工具与技术:建立适应数据集规模的Bootstrap置信区间,提出既能反映原始尺度又能体现标准化尺度的ALE效应量测度。此外,我们进一步展示如何利用这些工具进行可靠的统计推断,以呈现ALE灵活捕捉的数据模式特征,相关实现已集成至R语言的'ale'软件包中。本研究推动ALE在机器学习与统计分析领域的应用讨论,为该领域现有挑战提供实用解决方案。