Large Language Models (LLMs) are increasingly deployed in real-world scenarios where they may lack sufficient information to complete a given task. In such settings, the ability to actively seek out missing information becomes a critical capability. Existing approaches to enhancing this ability often rely on simplifying assumptions that degrade \textit{worst-case} performance. This is an issue with serious implications in high-stakes applications. In this work, we use the game of Twenty Questions to evaluate the information-seeking ability of LLMs. We introduce and formalize its adversarial counterpart, the Strategic Language Search (SLS) problem along with its variants as a two-player zero-sum extensive form game. We propose Game of Thought (GoT), a framework that applies game-theoretic techniques to approximate a Nash equilibrium (NE) strategy for the restricted variant of the game. Empirical results demonstrate that our approach consistently improves worst-case performance compared to (1) direct prompting-based methods and (2) heuristic-guided search methods across all tested settings.
翻译:大型语言模型(LLMs)在现实场景中的部署日益广泛,但在执行特定任务时可能面临信息不足的困境。在此类场景中,主动检索缺失信息的能力成为关键性能。现有增强该能力的方法通常依赖简化假设,这会降低模型在\textit{最坏情况}下的表现。这一问题在关键应用中具有严重影响。本研究通过“二十问”游戏评估LLMs的信息检索能力,并引入其对抗性变体——战略语言搜索(SLS)问题及其衍生形式,将其形式化为双人零和扩展式博弈。我们提出博弈思维(GoT)框架,该框架应用博弈论技术来逼近受限博弈变体的纳什均衡(NE)策略。实证结果表明,在所有测试场景中,相较于(1)基于直接提示的方法和(2)启发式引导搜索方法,我们的方法能持续提升最坏情况下的性能表现。