Despite the much discussed capabilities of today's language models, they are still prone to silly and unexpected commonsense failures. We consider a retrospective verification approach that reflects on the correctness of LM outputs, and introduce Vera, a general-purpose model that estimates the plausibility of declarative statements based on commonsense knowledge. Trained on ~7M commonsense statements created from 19 QA datasets and two large-scale knowledge bases, and with a combination of three training objectives, Vera is a versatile model that effectively separates correct from incorrect statements across diverse commonsense domains. When applied to solving commonsense problems in the verification format, Vera substantially outperforms existing models that can be repurposed for commonsense verification, and it further exhibits generalization capabilities to unseen tasks and provides well-calibrated outputs. We find that Vera excels at filtering LM-generated commonsense knowledge and is useful in detecting erroneous commonsense statements generated by models like ChatGPT in real-world settings.
翻译:中文摘要:尽管当前语言模型展现出令人瞩目的能力,但其仍易产生荒谬且出人意料的常识性错误。本文采用一种回溯验证方法,通过评估语言模型输出的正确性,提出通用模型Vera——基于常识知识对陈述性语句的合理性进行估计。该模型利用来自19个问答数据集和两个大规模知识库构建的约700万条常识性语句进行训练,结合三项训练目标,成为能够有效区分不同常识领域正确与错误陈述的通用模型。在验证形式的常识性问题求解中,Vera显著优于可被改造用于常识验证的现有模型,并展现出对未见任务的泛化能力及良好的校准输出。实验表明,Vera在过滤语言模型生成的常识知识方面表现卓越,可用于检测ChatGPT等模型在现实场景中产生的错误常识陈述。