We observe that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences -- from arbitrary ones procedurally generated by probabilistic context-free grammars (PCFG), to more rich spatial patterns found in the Abstract Reasoning Corpus (ARC), a general AI benchmark, prompted in the style of ASCII art. Surprisingly, pattern completion proficiency can be partially retained even when the sequences are expressed using tokens randomly sampled from the vocabulary. These results suggest that without any additional training, LLMs can serve as general sequence modelers, driven by in-context learning. In this work, we investigate how these zero-shot capabilities may be applied to problems in robotics -- from extrapolating sequences of numbers that represent states over time to complete simple motions, to least-to-most prompting of reward-conditioned trajectories that can discover and represent closed-loop policies (e.g., a stabilizing controller for CartPole). While difficult to deploy today for real systems due to latency, context size limitations, and compute costs, the approach of using LLMs to drive low-level control may provide an exciting glimpse into how the patterns among words could be transferred to actions.
翻译:我们观察到,预训练的大型语言模型(LLMs)能够自回归地完成复杂的标记序列——从由概率上下文无关文法(PCFG)程序化生成的任意序列,到以ASCII艺术风格提示的抽象推理语料库(ARC,一个通用人工智能基准测试)中更丰富的空间模式。令人惊讶的是,即使序列使用从词汇表中随机采样的标记表示,模式完成能力也能部分保留。这些结果表明,无需额外训练,LLMs即可通过上下文学习充当通用序列建模器。在本工作中,我们研究了如何将这些零样本能力应用于机器人领域的问题——从推断表示随时间变化的状态的数字序列以完成简单运动,到利用最小到最大提示法处理基于奖励的轨迹,从而发现并表示闭环策略(例如,CartPole的稳定控制器)。尽管由于延迟、上下文窗口限制和计算成本,当前难以在真实系统中部署,但使用LLMs驱动底层控制的方法可能提供了一个令人兴奋的视角,展示了词语模式如何迁移至动作。