Large language models (LLMs) are increasingly used as agents that interact with users and with the world. To do so successfully, LLMs must construct representations of the world and form probabilistic beliefs about them. To provide personalized recommendations, for example, the LLM needs to infer a user's preferences from their behavior over multiple interactions. The Bayesian inference framework lays out the optimal way for an agent to update its beliefs as it receives new information. We first show that LLMs fall far short of the standard defined by the Bayesian framework. We then show that by teaching LLMs to mimic the predictions of the normative Bayesian model, we can dramatically improve their ability to update their beliefs; this ability generalizes to new tasks. We conclude that LLMs can effectively learn reasoning skills from examples and generalize those skills to new domains.
翻译:大型语言模型(LLMs)正日益被用作与用户及现实世界交互的智能体。为实现有效交互,LLMs必须构建对世界的表征并形成相应的概率信念。例如,为提供个性化推荐,LLMs需要根据用户在多轮交互中的行为推断其偏好。贝叶斯推理框架为智能体在接收新信息时更新其信念提供了最优范式。我们首先证明LLMs远未达到贝叶斯框架所定义的标准。随后我们通过教导LLMs模仿规范性贝叶斯模型的预测,显著提升了其更新信念的能力;这种能力能够泛化至新任务。我们的结论表明,LLMs能够有效从示例中学习推理技能,并将这些技能迁移至新领域。