With the growing adoption of AI-based systems across everyday life, the need to understand their decision-making mechanisms is correspondingly increasing. The level at which we can trust the statistical inferences made from AI-based decision systems is an increasing concern, especially in high-risk systems such as criminal justice or medical diagnosis, where incorrect inferences may have tragic consequences. Despite their successes in providing solutions to problems involving real-world data, deep learning (DL) models cannot quantify the certainty of their predictions. These models are frequently quite confident, even when their solutions are incorrect. This work presents a method to infer prominent features in two DL classification models trained on clinical and non-clinical text by employing techniques from topological and geometric data analysis. We create a graph of a model's feature space and cluster the inputs into the graph's vertices by the similarity of features and prediction statistics. We then extract subgraphs demonstrating high-predictive accuracy for a given label. These subgraphs contain a wealth of information about features that the DL model has recognized as relevant to its decisions. We infer these features for a given label using a distance metric between probability measures, and demonstrate the stability of our method compared to the LIME and SHAP interpretability methods. This work establishes that we may gain insights into the decision mechanism of a DL model. This method allows us to ascertain if the model is making its decisions based on information germane to the problem or identifies extraneous patterns within the data.
翻译:随着基于人工智能的系统在日常生活中的日益普及,理解其决策机制的需求也随之增长。我们对基于人工智能的决策系统所做出的统计推断的信任程度日益成为关注焦点,尤其是在刑事司法或医学诊断等高风险系统中,错误的推断可能导致灾难性后果。尽管深度学习模型在解决涉及真实世界数据的问题方面取得了成功,但它们无法量化其预测的确定性。这些模型经常表现出高度自信,即使其解决方案是错误的。本文提出了一种方法,通过运用拓扑和几何数据分析技术,从基于临床和非临床文本训练的两个深度学习分类模型中推断显著特征。我们构建一个模型特征空间的图,并根据特征的相似性和预测统计将输入聚类到该图的顶点中。随后,我们提取出对给定标签具有高预测准确性的子图。这些子图包含了模型认为与其决策相关的特征的大量信息。我们利用概率测度之间的距离度量来推断给定标签的这些特征,并证明了我们的方法相较于LIME和SHAP可解释性方法的稳定性。这项工作表明,我们可以深入了解深度学习模型的决策机制。该方法使我们能够判断模型是基于与问题相关的信息做出决策,还是识别出了数据中的无关模式。