As Machine Learning models are considered for autonomous decisions with significant social impact, the need for understanding how these models work rises rapidly. Explainable Artificial Intelligence (XAI) aims to provide interpretations for predictions made by Machine Learning models, in order to make the model trustworthy and more transparent for the user. For example, selecting relevant input variables for the problem directly impacts the model's ability to learn and make accurate predictions, so obtaining information about input importance play a crucial role when training the model. One of the main XAI techniques to obtain input variable importance is the sensitivity analysis based on partial derivatives. However, existing literature of this method provide no justification of the aggregation metrics used to retrieved information from the partial derivatives. In this paper, a theoretical framework is proposed to study sensitivities of ML models using metric techniques. From this metric interpretation, a complete family of new quantitative metrics called $\alpha$-curves is extracted. These $\alpha$-curves provide information with greater depth on the importance of the input variables for a machine learning model than existing XAI methods in the literature. We demonstrate the effectiveness of the $\alpha$-curves using synthetic and real datasets, comparing the results against other XAI methods for variable importance and validating the analysis results with the ground truth or literature information.
翻译:随着机器学习模型被用于具有重大社会影响的自主决策,理解这些模型工作原理的需求迅速增长。可解释人工智能(XAI)旨在为机器学习模型做出的预测提供解释,以使模型对用户更加可信和透明。例如,为问题选择相关的输入变量直接影响模型学习和做出准确预测的能力,因此在训练模型时获取输入重要性的信息起着关键作用。获取输入变量重要性的主要XAI技术之一是基于偏导数的敏感性分析。然而,现有文献中对此方法使用的从偏导数中提取信息的聚合度量缺乏理论证明。本文提出一个理论框架,利用度量技术研究机器学习模型的敏感性。基于这种度量解释,我们提取出一族完整的量化新度量,称为α曲线。这些α曲线提供的信息在输入变量对机器学习模型的重要性方面,比现有文献中的XAI方法更加深入。我们通过合成数据集和真实数据集展示了α曲线的有效性,将结果与其他用于变量重要性的XAI方法进行比较,并通过真实数据或文献信息验证分析结果。