This thesis explores the generation of local explanations for already deployed machine learning models, aiming to identify optimal conditions for producing meaningful explanations considering both data and user requirements. The primary goal is to develop methods for generating explanations for any model while ensuring that these explanations remain faithful to the underlying model and comprehensible to the users. The thesis is divided into two parts. The first enhances a widely used rule-based explanation method. It then introduces a novel approach for evaluating the suitability of linear explanations to approximate a model. Additionally, it conducts a comparative experiment between two families of counterfactual explanation methods to analyze the advantages of one over the other. The second part focuses on user experiments to assess the impact of three explanation methods and two distinct representations. These experiments measure how users perceive their interaction with the model in terms of understanding and trust, depending on the explanations and representations. This research contributes to a better explanation generation, with potential implications for enhancing the transparency, trustworthiness, and usability of deployed AI systems.
翻译:本论文探讨已部署机器学习模型的局部解释生成问题,旨在从数据和用户需求两个维度确定产生有意义解释的最优条件。主要目标是开发适用于任意模型的解释生成方法,同时确保这些解释既能忠实反映底层模型,又能被用户理解。论文分为两部分:第一部分改进了一种广泛使用的基于规则的解释方法,随后提出评估线性解释拟合模型适用性的新方法,并通过对比两类反事实解释方法家族的实验分析各自优势。第二部分聚焦用户实验,评估三种解释方法与两种不同表征方式的影响。这些实验测量了用户根据不同解释和表征方式,在理解与信任层面感知与模型交互的效果。本研究有助于优化解释生成,对提升已部署AI系统的透明度、可信度与可用性具有潜在应用价值。