Questions combine our mastery of language with our remarkable facility for reasoning about uncertainty. How do people navigate vast hypothesis spaces to pose informative questions given limited cognitive resources? We study these tradeoffs in a classic grounded question-asking task based on the board game Battleship. Our language-informed program sampling (LIPS) model uses large language models (LLMs) to generate natural language questions, translate them into symbolic programs, and evaluate their expected information gain. We find that with a surprisingly modest resource budget, this simple Monte Carlo optimization strategy yields informative questions that mirror human performance across varied Battleship board scenarios. In contrast, LLM-only baselines struggle to ground questions in the board state; notably, GPT-4V provides no improvement over non-visual baselines. Our results illustrate how Bayesian models of question-asking can leverage the statistics of language to capture human priors, while highlighting some shortcomings of pure LLMs as grounded reasoners.
翻译:提问将我们对语言的掌握与对不确定性推理的卓越能力相结合。在认知资源有限的情况下,人们如何跨越庞大的假设空间提出信息量丰富的问题?我们基于棋盘游戏“战舰”中的经典接地提问任务研究了这些权衡。我们的语言知情程序采样(LIPS)模型使用大型语言模型(LLMs)生成自然语言问题,将其转化为符号程序,并评估其预期信息增益。我们发现,在资源预算相当有限的情况下,这种简单的蒙特卡洛优化策略能产生信息量丰富的问题,与人类在各种战舰棋盘场景下的表现相媲美。相比之下,仅依赖LLM的基线模型难以将问题接地于棋盘状态;值得注意的是,GPT-4V在非视觉基线上未提供任何改进。我们的结果说明了基于贝叶斯的提问模型如何利用语言统计来捕捉人类先验,同时凸显了纯LLM作为接地推理者的不足之处。