This paper presents null-shot prompting. Null-shot prompting exploits hallucination in large language models (LLMs) by instructing LLMs to utilize information from the "Examples" section that never exists within the provided context to perform a task. While reducing hallucination is crucial and non-negligible for daily and critical uses of LLMs, we propose that in the current landscape in which these LLMs still hallucinate, it is possible, in fact, to exploit hallucination to increase performance in performing tasks compared to standard zero-shot prompting. Experiments with six LLMs show improvements in performance across the majority of eight datasets, including reading comprehension, arithmetic reasoning, and closed-book question answering. The observed inconsistency in increased relative performance across LLMs also potentially indicates a different degree of inherent hallucination in each model. These differences show that it is possible to utilize null-shot prompting as a way to detect degrees of hallucination in LLMs using existing benchmarking datasets. We also perform ablation studies, including experimenting with a modified version of null-shot prompting that incorporates ideas from zero-shot chain-of-thought prompting, which shows different trends of results.
翻译:本文提出空样本提示方法。空样本提示通过引导大型语言模型(LLMs)利用提供的上下文中从未存在的"示例"部分进行任务执行,从而利用其幻觉现象。虽然减少幻觉对于LLMs的日常和关键用途至关重要且不可忽视,但本文认为,在当前这些LLMs仍然存在幻觉的背景下,实际上可以通过利用幻觉来提升任务执行性能,其效果优于标准零样本提示。在六个LLMs上的实验表明,包括阅读理解、算术推理和闭卷问答在内的八个数据集中大多数任务性能均有提升。不同LLMs之间观察到的相对性能提升不一致性,也潜在反映了各模型固有的幻觉程度差异。这些差异表明,可以利用空样本提示结合现有基准数据集来检测LLMs的幻觉程度。我们还进行了消融研究,包括实验融入零样本思维链提示思想的改进版空样本提示,显示出不同的结果趋势。