PASTA: Pretrained Action-State Transformer Agents

Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains, including NLP, vision, and biology. Recent approaches involve pre-training transformer models on vast amounts of unlabeled data, serving as a starting point for efficiently solving downstream tasks. In reinforcement learning, researchers have recently adapted these approaches, developing models pre-trained on expert trajectories. This advancement enables the models to tackle a broad spectrum of tasks, ranging from robotics to recommendation systems. However, existing methods mostly rely on intricate pre-training objectives tailored to specific downstream applications. This paper conducts a comprehensive investigation of models, referred to as pre-trained action-state transformer agents (PASTA). Our study covers a unified methodology and covers an extensive set of general downstream tasks including behavioral cloning, offline RL, sensor failure robustness, and dynamics change adaptation. Our objective is to systematically compare various design choices and offer valuable insights that will aid practitioners in developing robust models. Key highlights of our study include tokenization at the component level for actions and states, the use of fundamental pre-training objectives such as next token prediction or masked language modeling, simultaneous training of models across multiple domains, and the application of various fine-tuning strategies. In this study, the developed models contain fewer than 7 million parameters allowing a broad community to use these models and reproduce our experiments. We hope that this study will encourage further research into the use of transformers with first principle design choices to represent RL trajectories and contribute to robust policy learning.

翻译：自监督学习已在包括自然语言处理、计算机视觉和生物学在内的多个计算领域引发了革命性的范式转变。近期的方法涉及在大量未标注数据上预训练Transformer模型，并将其作为有效解决下游任务的起点。在强化学习中，研究人员近期将此类方法改造用于预训练基于专家轨迹的模型。这一进展使模型能够解决从机器人技术到推荐系统的广泛任务。然而，现有方法大多依赖于针对特定下游应用设计的复杂预训练目标。本文对一类称为预训练动作-状态Transformer智能体的模型进行了系统研究。我们的研究涵盖统一方法论，并涉及广泛的一般性下游任务，包括行为克隆、离线强化学习、传感器故障鲁棒性以及动力学变化适应。我们的目标在于系统对比各种设计选择，并为从业者开发鲁棒模型提供宝贵见解。本研究的核心亮点包括：对动作和状态进行组件级分词、采用如下一标记预测或掩码语言建模等基础预训练目标、跨多领域同步训练模型，以及应用多种微调策略。本研究开发的模型参数量均低于700万，这使得更广泛的研究群体能够使用这些模型并复现我们的实验。我们希望本研究能鼓励更多将Transformer与简约设计原则结合来表征强化学习轨迹的研究，从而推动鲁棒策略学习的发展。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日