This paper presents null-shot prompting. Null-shot prompting exploits hallucination in large language models (LLMs) by instructing LLMs to utilize information from the "Examples" section that never exists within the provided context to perform a task. While reducing hallucination is crucial and non-negligible for daily and critical uses of LLMs, we propose that in the current landscape in which these LLMs still hallucinate, it is possible, in fact, to exploit hallucination to increase performance in performing tasks compared to standard zero-shot prompting. Experiments with eight LLMs show improvements in performance across the majority of eight datasets, including reading comprehension, arithmetic reasoning, and closed-book question answering. The observed inconsistency in increased relative performance across the LLMs also potentially indicates a different degree of inherent hallucination in each model. These differences show that it is possible to utilize null-shot prompting as a way to detect degrees of hallucination in LLMs using existing benchmarking datasets. We also perform ablation studies, including experimenting with a modified version of null-shot prompting that incorporates ideas from zero-shot chain-of-thought prompting, which shows different trends of results.
翻译:本文提出了零样本提示(null-shot prompting)方法。零样本提示利用大型语言模型(LLMs)中的幻觉现象,通过指示LLMs利用提供的上下文中根本不存在“示例”部分的信息来执行任务。尽管减少幻觉对于日常和关键用途的LLMs至关重要且不可忽视,但我们认为,在当前这些LLMs仍会产生幻觉的背景下,实际上可以利用幻觉来提升任务执行性能,相较于标准零样本提示(zero-shot prompting)。在八个LLMs上的实验表明,在涵盖阅读理解、算术推理和闭卷问答的八个数据集中,大多数任务性能均有所提升。不同LLMs相对性能提升幅度观察到的差异,也潜在反映了各模型固有幻觉程度的不同。这些差异表明,可以利用零样本提示作为方法,通过现有基准数据集检测LLMs的幻觉程度。我们还进行了消融实验,包括探索融合零样本思维链(zero-shot chain-of-thought)提示思想的零样本提示改进版本,该版本展示了不同的结果趋势。