Saliency maps can explain a neural model's predictions by identifying important input features. They are difficult to interpret for laypeople, especially for instances with many features. In order to make them more accessible, we formalize the underexplored task of translating saliency maps into natural language and compare methods that address two key challenges of this approach -- what and how to verbalize. In both automatic and human evaluation setups, using token-level attributions from text classification tasks, we compare two novel methods (search-based and instruction-based verbalizations) against conventional feature importance representations (heatmap visualizations and extractive rationales), measuring simulatability, faithfulness, helpfulness and ease of understanding. Instructing GPT-3.5 to generate saliency map verbalizations yields plausible explanations which include associations, abstractive summarization and commonsense reasoning, achieving by far the highest human ratings, but they are not faithfully capturing numeric information and are inconsistent in their interpretation of the task. In comparison, our search-based, model-free verbalization approach efficiently completes templated verbalizations, is faithful by design, but falls short in helpfulness and simulatability. Our results suggest that saliency map verbalization makes feature attribution explanations more comprehensible and less cognitively challenging to humans than conventional representations.
翻译:显著性图可通过识别关键输入特征来解释神经模型的预测。然而对于非专业用户,尤其是涉及多特征的实例,这类图表难以解读。为提升其可理解性,我们正式定义了将显著性图转化为自然语言这一探索不足的任务,并比较了应对该任务两大核心挑战(口述内容与口述方式)的不同方法。在基于文本分类任务的词元级归因评估中,我们通过自动化与人工两种评估体系,将两种新方法(基于搜索的口述化与基于指令的口述化)与传统特征重要性表征(热力图可视化与抽取式解释)进行对比,测量了可仿真性、忠实性、辅助性与易懂性。使用GPT-3.5生成显著性图口述化结果时,所得解释包含关联推理、抽象摘要与常识推理,获得了迄今为止最高的人工评分,但未能忠实反映数值信息且任务解读存在不一致性。相比之下,基于搜索的无模型口述化方法能高效完成模板化口述,天然具备忠实性,但在辅助性与可仿真性方面存在不足。研究结果表明,与传统表征方式相比,显著性图口述化能使特征归因解释更易理解、认知负担更低。