Goal-conditioned reinforcement learning has shown considerable potential in robotic manipulation; however, existing approaches remain limited by their reliance on prioritizing collected experience, resulting in suboptimal performance across diverse tasks. Inspired by human learning behaviors, we propose a more comprehensive learning paradigm, ACDC, which integrates multidimensional Adaptive Curriculum (AC) Planning with Dynamic Contrastive (DC) Control to guide the agent along a well-designed learning trajectory. More specifically, at the planning level, the AC component schedules the learning curriculum by dynamically balancing diversity-driven exploration and quality-driven exploitation based on the agent's success rate and training progress. At the control level, the DC component implements the curriculum plan through norm-constrained contrastive learning, enabling magnitude-guided experience selection aligned with the current curriculum focus. Extensive experiments on challenging robotic manipulation tasks demonstrate that ACDC consistently outperforms the state-of-the-art baselines in both sample efficiency and final task success rate.
翻译:目标条件强化学习在机器人操作领域展现出巨大潜力;然而,现有方法仍受限于其对收集经验优先排序的依赖,导致其在多样化任务中的性能表现欠佳。受人类学习行为启发,我们提出了一种更全面的学习范式ACDC,该范式将多维自适应课程(AC)规划与动态对比(DC)控制相结合,以引导智能体沿着精心设计的学习轨迹前进。具体而言,在规划层面,AC组件根据智能体的成功率和训练进度,动态平衡多样性驱动的探索与质量驱动的利用,从而调度学习课程。在控制层面,DC组件通过范数约束的对比学习来实施课程计划,实现与当前课程重点对齐的幅度引导经验选择。在具有挑战性的机器人操作任务上进行的大量实验表明,ACDC在样本效率和最终任务成功率方面均持续优于最先进的基线方法。