PASTA: Pretrained Action-State Transformer Agents

Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains, including NLP, vision, and biology. Recent approaches involve pre-training transformer models on vast amounts of unlabeled data, serving as a starting point for efficiently solving downstream tasks. In the realm of reinforcement learning, researchers have recently adapted these approaches by developing models pre-trained on expert trajectories, enabling them to address a wide range of tasks, from robotics to recommendation systems. However, existing methods mostly rely on intricate pre-training objectives tailored to specific downstream applications. This paper presents a comprehensive investigation of models we refer to as Pretrained Action-State Transformer Agents (PASTA). Our study uses a unified methodology and covers an extensive set of general downstream tasks including behavioral cloning, offline RL, sensor failure robustness, and dynamics change adaptation. Our goal is to systematically compare various design choices and provide valuable insights to practitioners for building robust models. Key highlights of our study include tokenization at the action and state component level, using fundamental pre-training objectives like next token prediction, training models across diverse domains simultaneously, and using parameter efficient fine-tuning (PEFT). The developed models in our study contain fewer than 10 million parameters and the application of PEFT enables fine-tuning of fewer than 10,000 parameters during downstream adaptation, allowing a broad community to use these models and reproduce our experiments. We hope that this study will encourage further research into the use of transformers with first-principles design choices to represent RL trajectories and contribute to robust policy learning.

翻译：自监督学习已在包括自然语言处理、计算机视觉和生物学在内的多个计算领域引发了革命性范式转变。近期方法涉及在大量未标注数据上预训练变换器模型，将其作为高效解决下游任务的起始点。在强化学习领域，研究者通过开发在专家轨迹上预训练的模型，已将这些方法应用于从机器人技术到推荐系统的广泛任务。然而，现有方法大多依赖针对特定下游应用定制的复杂预训练目标。本文对我们称为“预训练动作-状态变换器智能体”（PASTA）的模型进行了全面研究。研究采用统一方法论，涵盖了广泛的一般性下游任务，包括行为克隆、离线强化学习、传感器故障鲁棒性及动态变化适应。我们的目标是系统比较各种设计选择，为从业者构建鲁棒模型提供宝贵见解。研究亮点包括在动作和状态组件层面的分词处理、使用基础预训练目标（如下一个标记预测）、同时在多个不同领域训练模型，以及应用参数高效微调（PEFT）。本研究所开发的模型参数少于1000万，且通过PEFT可在下游适配中微调少于1万个参数，这使得更广泛的研究群体能够使用这些模型并复现我们的实验。我们希望这项研究能鼓励进一步探索采用第一性原理设计选择的变换器来表示强化学习轨迹，并促进鲁棒策略学习。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日