Can large language models explore in-context?

We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making. We focus on native performance of existing LLMs, without training interventions. We deploy LLMs as agents in simple multi-armed bandit environments, specifying the environment description and interaction history entirely in-context, i.e., within the LLM prompt. We experiment with GPT-3.5, GPT-4, and Llama2, using a variety of prompt designs, and find that the models do not robustly engage in exploration without substantial interventions: i) Across all of our experiments, only one configuration resulted in satisfactory exploratory behavior: GPT-4 with chain-of-thought reasoning and an externally summarized interaction history, presented as sufficient statistics; ii) All other configurations did not result in robust exploratory behavior, including those with chain-of-thought reasoning but unsummarized history. Although these findings can be interpreted positively, they suggest that external summarization -- which may not be possible in more complex settings -- is important for obtaining desirable behavior from LLM agents. We conclude that non-trivial algorithmic interventions, such as fine-tuning or dataset curation, may be required to empower LLM-based decision making agents in complex settings.

翻译：我们研究了当代大型语言模型（LLMs）在多大程度上能够进行探索——这是强化学习和决策中的核心能力。我们重点关注现有LLM在无训练干预下的原生表现。我们将LLM部署为简单多臂赌博机环境中的智能体，将环境描述和交互历史完全置于上下文（即LLM提示）中。我们使用GPT-3.5、GPT-4和Llama2进行实验，采用多种提示设计，发现这些模型在没有实质性干预的情况下无法稳健地进行探索：i) 在所有实验中，仅有一种配置产生了令人满意的探索行为——采用思维链推理且交互历史由外部总结为充分统计量的GPT-4；ii) 其他所有配置（包括采用思维链推理但未总结历史的情况）均未产生稳健的探索行为。尽管这些发现可从积极角度解读，但结果表明，外部总结（在更复杂场景中可能无法实现）对于获得LLM智能体的理想行为至关重要。我们得出结论：在复杂场景中，为赋予基于LLM的决策智能体能力，可能需要采用非平凡的算法干预措施，例如微调或数据集整理。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

KDD20 | 面向时态交互网络的数据驱动图生成模型

专知会员服务

24+阅读 · 2020年9月25日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日