We examine the ability of large language models (LLMs) to generate salient (interesting) negative statements about real-world entities; an emerging research topic of the last few years. We probe the LLMs using zero- and k-shot unconstrained probes, and compare with traditional methods for negation generation, i.e., pattern-based textual extractions and knowledge-graph-based inferences, as well as crowdsourced gold statements. We measure the correctness and salience of the generated lists about subjects from different domains. Our evaluation shows that guided probes do in fact improve the quality of generated negatives, compared to the zero-shot variant. Nevertheless, using both prompts, LLMs still struggle with the notion of factuality of negatives, frequently generating many ambiguous statements, or statements with negative keywords but a positive meaning.
翻译:我们研究了大型语言模型针对现实世界实体生成显著(有趣)否定陈述的能力;这是近几年来新兴的研究课题。我们采用零样本和k样本非约束探针测试这些模型,并将其与传统的否定生成方法(即基于模式的文本提取和基于知识图谱的推理)以及众包黄金标准陈述进行比较。我们测量了关于不同领域主体的生成列表的正确性和显著性。评估表明,与零样本变体相比,引导探针确实能提高生成否定的质量。然而,即使采用两种提示方式,大型语言模型仍难以把握否定事实性概念,频繁生成大量模棱两可的陈述,或包含否定关键词却表达肯定意义的陈述。