We aim to solve the problem of generating coarse-to-fine skills learning from demonstrations (LfD). To scale precision, traditional LfD approaches often rely on extensive fine-grained demonstrations with external interpolations or dynamics models with limited generalization capabilities. For memory-efficient learning and convenient granularity change, we propose a novel diffusion-state space model (SSM) based policy (DiSPo) that learns from diverse coarse skills and produces varying control scales of actions by leveraging an SSM, Mamba. Our evaluations show the adoption of Mamba and the proposed step-scaling method enable DiSPo to outperform in three coarse-to-fine benchmark tests with maximum 81% higher success rate than baselines. In addition, DiSPo improves inference efficiency by generating coarse motions in less critical regions. We finally demonstrate the scalability of actions with simulation and real-world manipulation tasks. Code and Videos are available at https://robo-dispo.github.io.
翻译:我们旨在解决从演示中学习(LfD)生成从粗到细技能的问题。为了扩展精度,传统的LfD方法通常依赖于大量具有外部插值的细粒度演示,或依赖于泛化能力有限的动力学模型。为了实现内存高效的学习和便捷的粒度变化,我们提出了一种新颖的基于扩散-状态空间模型(SSM)的策略(DiSPo),该策略从多样化的粗粒度技能中学习,并利用SSM模型Mamba生成不同控制尺度的动作。我们的评估表明,采用Mamba模型和所提出的步长缩放方法使DiSPo在三个从粗到细的基准测试中表现出色,其成功率最高比基线方法高出81%。此外,DiSPo通过在非关键区域生成粗粒度动作,提高了推理效率。最后,我们通过仿真和真实世界操作任务展示了动作的可扩展性。代码和视频可在 https://robo-dispo.github.io 获取。