An open challenge in reinforcement learning (RL) is the effective deployment of a trained policy to new or slightly different situations as well as semantically-similar environments. We introduce Symmetry-Invariant Transformer (SiT), a scalable vision transformer (ViT) that leverages both local and global data patterns in a self-supervised manner to improve generalisation. Central to our approach is Graph Symmetric Attention, which refines the traditional self-attention mechanism to preserve graph symmetries, resulting in invariant and equivariant latent representations. We showcase SiT's superior generalization over ViTs on MiniGrid and Procgen RL benchmarks, and its sample efficiency on Atari 100k and CIFAR10.
翻译:强化学习(RL)中的一个开放挑战是如何将训练好的策略有效部署到新的或略有不同的情境以及语义相似的环境中。本文提出对称不变变换器(SiT),这是一种可扩展的视觉变换器(ViT),它以自监督的方式同时利用局部和全局数据模式来提升泛化能力。我们方法的核心是图对称注意力机制,该机制改进了传统的自注意力机制以保持图对称性,从而产生不变和等变的潜在表示。我们在MiniGrid和Procgen强化学习基准测试中展示了SiT相对于ViT的卓越泛化能力,并在Atari 100k和CIFAR10数据集上验证了其样本效率。