A combined task-level reinforcement learning and motion planning framework is proposed in this paper to address a multi-class in-rack test tube rearrangement problem. At the task level, the framework uses reinforcement learning to infer a sequence of swap actions while ignoring robotic motion details. At the motion level, the framework accepts the swapping action sequences inferred by task-level agents and plans the detailed robotic pick-and-place motion. The task and motion-level planning form a closed loop with the help of a condition set maintained for each rack slot, which allows the framework to perform replanning and effectively find solutions in the presence of low-level failures. Particularly for reinforcement learning, the framework leverages a distributed deep Q-learning structure with the Dueling Double Deep Q Network (D3QN) to acquire near-optimal policies and uses an A${}^\star$-based post-processing technique to amplify the collected training data. The D3QN and distributed learning help increase training efficiency. The post-processing helps complete unfinished action sequences and remove redundancy, thus making the training data more effective. We carry out both simulations and real-world studies to understand the performance of the proposed framework. The results verify the performance of the RL and post-processing and show that the closed-loop combination improves robustness. The framework is ready to incorporate various sensory feedback. The real-world studies also demonstrated the incorporation.
翻译:本文提出了一种结合任务级强化学习与运动规划的框架,用于解决多类别试管在架重排问题。在任务层面,该框架利用强化学习推导交换动作序列,同时忽略机器人运动细节;在运动层面,框架接收任务级智能体推断的动作序列,并规划详细的机器人抓取-放置运动。通过为每个架槽维护的条件集,任务与运动级规划形成闭环,使框架能够在低级故障发生时执行重规划并有效寻找解。特别地,针对强化学习,框架采用基于Dueling Double Deep Q Network(D3QN)的分布式深度Q学习结构获取近最优策略,并利用A${}^\star$基后处理技术增强训练数据。D3QN与分布式学习提高了训练效率,后处理技术可补全未完成动作序列并消除冗余,从而提升训练数据的有效性。我们通过仿真与真实实验评估框架性能,结果验证了强化学习与后处理的有效性,并表明闭环组合提升了鲁棒性。该框架具备集成多种感官反馈的能力,真实实验也对此进行了验证。