While recently developed NLP explainability methods let us open the black box in various ways (Madsen et al., 2022), a missing ingredient in this endeavor is an interactive tool offering a conversational interface. Such a dialogue system can help users explore datasets and models with explanations in a contextualized manner, e.g. via clarification or follow-up questions, and through a natural language interface. We adapt the conversational explanation framework TalkToModel (Slack et al., 2022) to the NLP domain, add new NLP-specific operations such as free-text rationalization, and illustrate its generalizability on three NLP tasks (dialogue act classification, question answering, hate speech detection). To recognize user queries for explanations, we evaluate fine-tuned and few-shot prompting models and implement a novel Adapter-based approach. We then conduct two user studies on (1) the perceived correctness and helpfulness of the dialogues, and (2) the simulatability, i.e. how objectively helpful dialogical explanations are for humans in figuring out the model's predicted label when it's not shown. We found rationalization and feature attribution were helpful in explaining the model behavior. Moreover, users could more reliably predict the model outcome based on an explanation dialogue rather than one-off explanations.
翻译:尽管近期发展的NLP可解释性方法以多种方式打开了黑箱(Madsen等,2022),但这一领域仍缺少一个具备对话界面的交互式工具。此类对话系统可帮助用户在上下文语境中探索数据集与模型解释,例如通过澄清或追问的方式,并借助自然语言界面实现交互。我们将对话式解释框架TalkToModel(Slack等,2022)适配至NLP领域,增加了自由文本合理化等NLP专用操作,并在三项NLP任务(对话行为分类、问答、仇恨言论检测)中验证其泛化能力。为识别用户的解释查询,我们评估了微调模型与少量样本提示模型,并提出了一种基于Adapter的新方法。随后开展两项用户研究:(1)对话的感知正确性与有用性;(2)可模拟性,即用户在不查看预测标签时,对话式解释对推断模型预测标签的客观帮助程度。研究发现,合理化与特征归因有助于解释模型行为;此外,相较于一次性解释,用户基于解释对话能更可靠地预测模型输出结果。