Recent advancements in imitation learning have been largely fueled by the integration of sequence models, which provide a structured flow of information to effectively mimic task behaviours. Currently, Decision Transformer (DT) and subsequently, the Hierarchical Decision Transformer (HDT), presented Transformer-based approaches to learn task policies. Recently, the Mamba architecture has shown to outperform Transformers across various task domains. In this work, we introduce two novel methods, Decision Mamba (DM) and Hierarchical Decision Mamba (HDM), aimed at enhancing the performance of the Transformer models. Through extensive experimentation across diverse environments such as OpenAI Gym and D4RL, leveraging varying demonstration data sets, we demonstrate the superiority of Mamba models over their Transformer counterparts in a majority of tasks. Results show that HDM outperforms other methods in most settings. The code can be found at https://github.com/meowatthemoon/HierarchicalDecisionMamba.
翻译:最近,模仿学习的进展很大程度上得益于序列模型的整合,这些模型通过提供结构化的信息流来有效模拟任务行为。目前,决策变换器(DT)及其后续的分层决策变换器(HDT)提出了基于Transformer的方法来学习任务策略。近来,曼巴架构被证明在多种任务领域中优于Transformer。本文引入了两种新方法——决策曼巴(DM)和分层决策曼巴(HDM),旨在提升Transformer模型的性能。通过在OpenAI Gym和D4RL等多种环境中利用不同演示数据集进行广泛实验,我们证明了曼巴模型在大多数任务中优于其Transformer对应模型。结果表明,HDM在大多数设置下优于其他方法。代码可在https://github.com/meowatthemoon/HierarchicalDecisionMamba获取。