Recently, commonsense reasoning in text generation has attracted much attention. Generative commonsense reasoning is the task that requires machines, given a group of keywords, to compose a single coherent sentence with commonsense plausibility. While existing datasets targeting generative commonsense reasoning focus on everyday scenarios, it is unclear how well machines reason under specific geographical and temporal contexts. We formalize this challenging task as SituatedGen, where machines with commonsense should generate a pair of contrastive sentences given a group of keywords including geographical or temporal entities. We introduce a corresponding English dataset consisting of 8,268 contrastive sentence pairs, which are built upon several existing commonsense reasoning benchmarks with minimal manual labor. Experiments show that state-of-the-art generative language models struggle to generate sentences with commonsense plausibility and still lag far behind human performance. Our dataset is publicly available at https://github.com/yunx-z/situated_gen.
翻译:近年来,文本生成中的常识推理备受关注。生成式常识推理任务要求机器在给定一组关键词的情况下,生成一句符合常识合理性的连贯句子。现有针对生成式常识推理的数据集主要关注日常场景,但机器在特定地理与时间上下文下的推理能力尚不明确。我们将这一挑战性任务形式化为SituatedGen,即具备常识的机器应根据包含地理或时间实体的一组关键词,生成一对具有对比性的句子。我们引入一个相应的英文数据集,包含8,268对对比性句子,这些句子基于多个现有常识推理基准构建,且人工成本极低。实验表明,最先进的生成式语言模型在生成具有常识合理性的句子方面仍举步维艰,与人类表现差距显著。我们的数据集已公开于https://github.com/yunx-z/situated_gen。