Questions are essential tools for acquiring the necessary information to complete information-seeking tasks. However, large language models (LLMs), especially open-source models, often perform poorly in generating informative questions, as measured by expected information gain (EIG). In this paper, we propose a method to enhance the informativeness of LLM-generated questions in 20-question game dialogues. We sample multiple questions from the same model (LLAMA 2-CHAT 7B) for each game and create pairs of low-EIG and high-EIG questions to apply a Direct Preference Optimization (DPO) algorithm. Our results show that this method produces more effective questions (in terms of EIG), even in domains different from those used to train the DPO model.
翻译:问题是完成信息寻求任务时获取必要信息的关键工具。然而,大语言模型(LLMs),尤其是开源模型,在生成信息丰富的问题方面通常表现不佳,这一点通过期望信息增益(EIG)来衡量。本文提出一种方法,旨在增强LLM在二十个问题游戏对话中生成问题的信息丰富性。我们为每个游戏从同一模型(LLAMA 2-CHAT 7B)中采样多个问题,并创建低EIG与高EIG问题对,以应用直接偏好优化(DPO)算法。我们的结果表明,该方法能产生更有效的问题(就EIG而言),即使在不同于训练DPO模型的领域中也是如此。