Large language models (LLMs) are increasingly used to simulate or automate human behavior in complex sequential decision-making settings. A natural question is then whether LLMs exhibit similar decision-making behavior to humans, and can achieve comparable (or superior) performance. In this work, we focus on the exploration-exploitation (E&E) tradeoff, a fundamental aspect of dynamic decision-making under uncertainty. We employ canonical multi-armed bandit (MAB) experiments introduced in the cognitive science and psychiatry literature to conduct a comparative study of the E&E strategies of LLMs, humans, and MAB algorithms. We use interpretable choice models to capture the E&E strategies of the agents and investigate how enabling thinking traces, through both prompting strategies and thinking models, shapes LLM decision-making. We find that enabling thinking in LLMs shifts their behavior toward more human-like behavior, characterized by a mix of random and directed exploration. In a simple stationary setting, thinking-enabled LLMs exhibit similar levels of random and directed exploration compared to humans. However, in more complex, non-stationary environments, LLMs struggle to match human adaptability, particularly in effective directed exploration, despite achieving similar regret in certain scenarios. Our findings highlight both the promise and limits of LLMs as simulators of human behavior and tools for automated decision-making and point to potential areas for improvement.
翻译:[translated abstract in Chinese]
大语言模型越来越多地被用于模拟或自动化人类在复杂序列决策场景中的行为。一个自然的问题是,大语言模型是否表现出与人类相似的决策行为,并能实现可比(或更优)的性能。本研究聚焦于探索-利用权衡——这一不确定性下动态决策的核心要素。我们采用认知科学与精神病学文献中引入的经典多臂老虎机实验,对大语言模型、人类与多臂老虎机算法的探索-利用策略进行对比研究。我们使用可解释的选择模型来捕捉智能体的探索-利用策略,并探究通过提示策略与思考模型启用思维痕迹如何塑造大语言模型的决策过程。研究发现,启用思维使大语言模型的行为更趋近于人类,其特征表现为随机探索与定向探索的混合。在简单平稳环境中,启用思维的大语言模型在随机探索与定向探索水平上与人类相似。然而,在更复杂的非平稳环境下,大语言模型难以匹配人类的适应能力,尤其是在有效定向探索方面存在不足,尽管在某些场景中其后悔值表现与人类相当。我们的发现既凸显了大语言模型作为人类行为模拟器及自动化决策工具的潜力与局限性,也为未来改进方向提供了启示。