Recently, large pretrained language models have achieved compelling performance on commonsense benchmarks. Nevertheless, it is unclear what commonsense knowledge the models learn and whether they solely exploit spurious patterns. Feature attributions are popular explainability techniques that identify important input concepts for model outputs. However, commonsense knowledge tends to be implicit and rarely explicitly presented in inputs. These methods cannot infer models' implicit reasoning over mentioned concepts. We present CommonsenseVIS, a visual explanatory system that utilizes external commonsense knowledge bases to contextualize model behavior for commonsense question-answering. Specifically, we extract relevant commonsense knowledge in inputs as references to align model behavior with human knowledge. Our system features multi-level visualization and interactive model probing and editing for different concepts and their underlying relations. Through a user study, we show that CommonsenseVIS helps NLP experts conduct a systematic and scalable visual analysis of models' relational reasoning over concepts in different situations.
翻译:近期,大规模预训练语言模型已在常识基准测试中展现出卓越性能。然而,这些模型究竟习得了何种常识知识,以及它们是否纯粹依赖虚假模式,目前尚不明确。特征归因是流行的可解释性技术,可识别对模型输出至关重要的输入概念。但常识知识往往具有隐含性,极少在输入中显式呈现,因此这些方法无法推断模型对提及概念的隐含推理过程。我们提出CommonsenseVIS——一种利用外部常识知识库将模型行为置于上下文中进行解释的可视化系统,专门用于常识问答任务。具体而言,我们从输入中提取相关常识知识作为参照,使模型行为与人类知识对齐。该系统支持多层级可视化以及面向不同概念及其潜在关系的交互式模型探测与编辑。通过用户研究,我们证明CommonsenseVIS能帮助自然语言处理专家在不同情境下对模型的概念关系推理进行系统化、可扩展的可视化分析。