Temperature sampling is a conventional approach to diversify large language model predictions. As temperature increases, the prediction becomes diverse but also vulnerable to hallucinations -- generating tokens that are sensible but not factual. One common approach to mitigate hallucinations is to provide source/grounding documents and the model is trained to produce predictions that bind to and are attributable to the provided source. It appears that there is a trade-off between diversity and attribution. To mitigate any such trade-off, we propose to relax the constraint of having a fixed temperature over decoding steps, and a mechanism to guide the dynamic temperature according to its relevance to the source through KL-divergence. Our experiments justifies the trade-off, and shows that our sampling algorithm outperforms the conventional top-k and top-p algorithms in conversational question-answering and summarization tasks.
翻译:温度采样是多样化大型语言模型预测的一种传统方法。随着温度升高,预测结果变得多样,但也易产生幻觉——即生成看似合理但不基于事实的令牌。缓解幻觉的一种常见方法是提供源/依据文档,并训练模型生成与所提供来源绑定且可归因的预测。这似乎表明多样性与归因之间存在权衡。为缓解此类权衡,我们提出放宽解码步骤中固定温度的限制,并通过KL-散度设计一种基于与源相关性的动态温度引导机制。实验证实了这一权衡的存在,并表明我们的采样算法在对话问答和摘要任务中优于传统的top-k和top-p算法。