We examine the ability of large language models (LLMs) to generate salient (interesting) negative statements about real-world entities; an emerging research topic of the last few years. We probe the LLMs using zero- and k-shot unconstrained probes, and compare with traditional methods for negation generation, i.e., pattern-based textual extractions and knowledge-graph-based inferences, as well as crowdsourced gold statements. We measure the correctness and salience of the generated lists about subjects from different domains. Our evaluation shows that guided probes do in fact improve the quality of generated negatives, compared to the zero-shot variant. Nevertheless, using both prompts, LLMs still struggle with the notion of factuality of negatives, frequently generating many ambiguous statements, or statements with negative keywords but a positive meaning.
翻译:我们研究了大型语言模型(LLMs)生成关于真实世界实体的显著(有趣)否定陈述的能力——这是近几年来一个新兴的研究方向。我们通过零样本和k样本无限制提示来探测LLMs,并与传统的否定生成方法(即基于模式的文本提取和基于知识图谱的推理)以及众包黄金标准陈述进行了比较。我们衡量了生成列表在来自不同领域的主体上的正确性和显著性。评估结果显示,与零样本变体相比,引导提示确实提高了生成否定的质量。然而,即便使用这两种提示方式,LLMs在否定事实性概念的理解上仍存在困难,经常生成大量模棱两可的陈述,或带有否定关键词但表达肯定意义的陈述。