Despite the much discussed capabilities of today's language models, they are still prone to silly and unexpected commonsense failures. We consider a retrospective verification approach that reflects on the correctness of LM outputs, and introduce Vera, a general-purpose model that estimates the plausibility of declarative statements based on commonsense knowledge. Trained on ~7M commonsense statements created from 19 QA datasets and two large-scale knowledge bases, and with a combination of three training objectives, Vera is a versatile model that effectively separates correct from incorrect statements across diverse commonsense domains. When applied to solving commonsense problems in the verification format, Vera substantially outperforms existing models that can be repurposed for commonsense verification, and it further exhibits generalization capabilities to unseen tasks and provides well-calibrated outputs. We find that Vera excels at filtering LM-generated commonsense knowledge and is useful in detecting erroneous commonsense statements generated by models like ChatGPT in real-world settings.
翻译:尽管当前语言模型的能力被广泛讨论,但它们仍容易产生离奇且出乎意料的常识性错误。我们提出一种回顾式验证方法,用于反思语言模型输出的正确性,并引入Vera——一个基于常识知识评估陈述性陈述合理性的通用模型。该模型利用从19个问答数据集和两个大型知识库中构建的约700万条常识陈述,通过三种训练目标的组合进行训练,能够有效区分不同常识领域中的正确与错误陈述。在采用验证格式解决常识问题时,Vera显著优于可被改造用于常识验证的现有模型,并展现出对未见任务的泛化能力,同时输出具有良好校准性。研究发现,Vera擅长过滤语言模型生成的常识知识,在检测类似ChatGPT等模型产生的错误常识陈述(如真实场景下的应用)中具有实用价值。