With the growing pervasiveness of artificial intelligence, the ability to explain the inferences made by machine learning models has become increasingly important. Numerous techniques for model explainability have been proposed, with natural-language textual explanations among the most widely used approaches. When applied to tabular data, these explanations typically draw on input features to justify a given inference. Consequently, a user's ability to interpret the explanation depends on their understanding of the input features. To quantify this feature-level understanding, Rossberg et al. introduced the Feature Understandability Scale. Building on that work, this proof-of-concept study collects understandability scores across two datasets, proposes a co-optimisation methodology of understandability and accuracy and presents the resulting explanations alongside the model accuracies. This work contributes to the body of knowledge on model interpretability by design. It is found that accuracy and understandability can be successfully co-optimised while maintaining high classification performances. The resulting explanations are considered more understandable at face value. Further research will aim to confirm these findings through user evaluation.
翻译:随着人工智能的日益普及,解释机器学习模型推理的能力变得愈发重要。已有多种模型可解释性技术被提出,其中自然语言文本解释是最广泛使用的方法之一。当应用于表格数据时,这些解释通常利用输入特征来证明特定推理的合理性。因此,用户解读解释的能力取决于他们对输入特征的理解程度。为量化这种特征层面的理解水平,Rossberg等人提出了特征可理解性量表。在此研究基础上,本概念验证研究收集了两个数据集上的可理解性评分,提出了一种可理解性与准确性的协同优化方法,并展示了最终解释与模型准确率的对应关系。本研究通过设计为模型可解释性知识体系做出了贡献。研究发现,在保持高分类性能的前提下,准确性与可理解性能够成功实现协同优化。由此产生的解释在表面意义上被认为更易于理解。未来研究将通过用户评估进一步验证这些发现。