Approaching robotic cloth manipulation using reinforcement learning based on visual feedback is appealing as robot perception and control can be learned simultaneously. However, major challenges result due to the intricate dynamics of cloth and the high dimensionality of the corresponding states, what shadows the practicality of the idea. To tackle these issues, we propose TraKDis, a novel Transformer-based Knowledge Distillation approach that decomposes the visual reinforcement learning problem into two distinct stages. In the first stage, a privileged agent is trained, which possesses complete knowledge of the cloth state information. This privileged agent acts as a teacher, providing valuable guidance and training signals for subsequent stages. The second stage involves a knowledge distillation procedure, where the knowledge acquired by the privileged agent is transferred to a vision-based agent by leveraging pre-trained state estimation and weight initialization. TraKDis demonstrates better performance when compared to state-of-the-art RL techniques, showing a higher performance of 21.9%, 13.8%, and 8.3% in cloth folding tasks in simulation. Furthermore, to validate robustness, we evaluate the agent in a noisy environment; the results indicate its ability to handle and adapt to environmental uncertainties effectively. Real robot experiments are also conducted to showcase the efficiency of our method in real-world scenarios.
翻译:采用基于视觉反馈的强化学习方法实现机器人布料操作具有吸引力,因为该方法能同时学习机器人感知与控制。然而,布料复杂的动力学特性及其对应状态的高维性带来了重大挑战,制约了该思想的实用性。为解决这些问题,我们提出TraKDis——一种新颖的基于Transformer的知识蒸馏方法,将视觉强化学习问题分解为两个不同阶段。第一阶段训练一个特权智能体,该智能体具备完整的布料状态信息。该特权智能体作为教师,为后续阶段提供有价值的引导和训练信号。第二阶段涉及知识蒸馏过程,通过利用预训练状态估计和权重初始化,将特权智能体习得的知识迁移至基于视觉的智能体。与最先进的强化学习技术相比,TraKDis展现出更优性能:在仿真布料折叠任务中,性能分别提升21.9%、13.8%和8.3%。此外,为验证鲁棒性,我们在噪声环境中评估了该智能体,结果表明其能有效应对并适应环境不确定性。我们还进行了真实机器人实验,以展示方法在实际场景中的有效性。