Questions combine our mastery of language with our remarkable facility for reasoning about uncertainty. How do people navigate vast hypothesis spaces to pose informative questions given limited cognitive resources? We study these tradeoffs in a classic grounded question-asking task based on the board game Battleship. Our language-informed program sampling (LIPS) model uses large language models (LLMs) to generate natural language questions, translate them into symbolic programs, and evaluate their expected information gain. We find that with a surprisingly modest resource budget, this simple Monte Carlo optimization strategy yields informative questions that mirror human performance across varied Battleship board scenarios. In contrast, LLM-only baselines struggle to ground questions in the board state; notably, GPT-4V provides no improvement over non-visual baselines. Our results illustrate how Bayesian models of question-asking can leverage the statistics of language to capture human priors, while highlighting some shortcomings of pure LLMs as grounded reasoners.
翻译:问题融合了我们对语言的掌握与对不确定性推理的卓越能力。在认知资源有限的情况下,人们如何探索庞大假设空间以提出具有信息量的问题?我们基于棋盘游戏《海战》中的经典接地提问任务,研究了这些权衡。我们的语言信息程序采样(LIPS)模型利用大型语言模型(LLMs)生成自然语言问题,将其转化为符号程序,并评估其预期信息增益。我们发现,在资源预算出奇有限的情况下,这种简单的蒙特卡洛优化策略能够生成具有信息量的问题,其在各种《海战》棋盘场景中的表现与人类表现相呼应。相比之下,仅基于LLM的基线模型难以将问题接地于棋盘状态;值得注意的是,GPT-4V相比非视觉基线并无改进。我们的结果展示了贝叶斯提问模型如何利用语言统计特性捕捉人类先验知识,同时揭示了纯LLM作为接地推理器的一些不足。