This paper introduces the shapr R package, a versatile tool for generating Shapley value-based prediction explanations for machine learning and statistical regression models. Moreover, the shaprpy Python library brings the core capabilities of shapr to the Python ecosystem. Shapley values originate from cooperative game theory in the 1950s, but have over the past few years become a widely used method for quantifying how a model's features/covariates contribute to specific prediction outcomes. The shapr package emphasizes conditional Shapley value estimates, providing a comprehensive range of approaches for accurately capturing feature dependencies -- a crucial aspect for correct model explanation, typically lacking in similar software. In addition to regular tabular data, the shapr R package includes specialized functionality for explaining time series forecasts. The package offers a minimal set of user functions with sensible default values for most use cases while providing extensive flexibility for advanced users to fine-tune computations. Additional features include parallelized computations, iterative estimation with convergence detection, and rich visualization tools. shapr also extends its functionality to compute causal and asymmetric Shapley values when causal information is available. Overall, the shapr and shaprpy packages aim to enhance the interpretability of predictive models within a powerful and user-friendly framework.
翻译:本文介绍了shapr R软件包,这是一个用于为机器学习和统计回归模型生成基于Shapley值的预测解释的多功能工具。此外,shaprpy Python库将shapr的核心功能引入Python生态系统。Shapley值起源于20世纪50年代的合作博弈论,但在过去几年中已成为量化模型特征/协变量如何影响特定预测结果的广泛应用方法。shapr软件包强调条件Shapley值估计,提供了一系列全面方法以准确捕捉特征依赖性——这是实现正确模型解释的关键方面,而同类软件通常缺乏此功能。除了常规表格数据外,shapr R软件包还包含专门用于解释时间序列预测的功能。该软件包通过最精简的用户函数集为大多数用例提供合理的默认值,同时为高级用户提供广泛的计算微调灵活性。附加功能包括并行计算、带收敛检测的迭代估计以及丰富的可视化工具。当因果信息可用时,shapr还能扩展其功能以计算因果和非对称Shapley值。总体而言,shapr和shaprpy软件包旨在通过强大且用户友好的框架增强预测模型的可解释性。