Temperature sampling is a conventional approach to diversify large language model predictions. As temperature increases, the prediction becomes diverse but also vulnerable to hallucinations -- generating tokens that are sensible but not factual. One common approach to mitigate hallucinations is to provide source/grounding documents and the model is trained to produce predictions that bind to and are attributable to the provided source. It appears that there is a trade-off between diversity and attribution. To mitigate any such trade-off, we propose to relax the constraint of having a fixed temperature over decoding steps, and a mechanism to guide the dynamic temperature according to its relevance to the source through KL-divergence. Our experiments justifies the trade-off, and shows that our sampling algorithm outperforms the conventional top-k and top-p algorithms in conversational question-answering and summarization tasks.
翻译:温度采样是一种传统的多样化大型语言模型预测的方法。随着温度升高,预测结果会变得多样,但也容易产生幻觉——即生成看似合理但不符合事实的标记。一种常见的缓解幻觉的方法是提供源文档/依据文档,并训练模型生成与所提供的源绑定且可归属的预测结果。然而,多样性与归因性之间似乎存在权衡。为了缓解这种权衡,我们提出放宽解码步骤中固定温度的限制,并通过KL散度根据与源的关联性来引导动态温度。我们的实验验证了这种权衡的存在,并表明我们的采样算法在对话式问答和摘要生成任务中优于传统的top-k和top-p算法。