We introduce Masked Trajectory Models (MTM) as a generic abstraction for sequential decision making. MTM takes a trajectory, such as a state-action sequence, and aims to reconstruct the trajectory conditioned on random subsets of the same trajectory. By training with a highly randomized masking pattern, MTM learns versatile networks that can take on different roles or capabilities, by simply choosing appropriate masks at inference time. For example, the same MTM network can be used as a forward dynamics model, inverse dynamics model, or even an offline RL agent. Through extensive experiments in several continuous control tasks, we show that the same MTM network -- i.e. same weights -- can match or outperform specialized networks trained for the aforementioned capabilities. Additionally, we find that state representations learned by MTM can significantly accelerate the learning speed of traditional RL algorithms. Finally, in offline RL benchmarks, we find that MTM is competitive with specialized offline RL algorithms, despite MTM being a generic self-supervised learning method without any explicit RL components. Code is available at https://github.com/facebookresearch/mtm
翻译:我们提出掩膜轨迹模型(Masked Trajectory Models,MTM)作为顺序决策问题的一种通用抽象方法。MTM将状态-动作序列等轨迹作为输入,旨在基于同一轨迹的随机子集条件重建完整轨迹。通过采用高度随机化的掩膜模式进行训练,MTM能够学习具备多重角色或能力的通用网络——在推理时只需选择适当的掩膜即可实现功能切换。例如,同一MTM网络可分别充当正向动力学模型、逆向动力学模型,甚至离线强化学习智能体。在多个连续控制任务的广泛实验中,我们证明具有相同权重的同一MTM网络,在各项能力上能够媲美甚至超越专用网络。此外,我们发现MTM学习到的状态表征能显著加速传统强化学习算法的训练速度。最后,在离线强化学习基准测试中,尽管MTM仅为不含显式强化学习组件的通用自监督学习方法,其表现仍能与专用离线强化学习算法相抗衡。代码已开源至 https://github.com/facebookresearch/mtm