Digital Twin-Driven Reinforcement Learning for Obstacle Avoidance in Robot Manipulators: A Self-Improving Online Training Framework

The evolution and growing automation of collaborative robots introduce more complexity and unpredictability to systems, highlighting the crucial need for robot's adaptability and flexibility to address the increasing complexities of their environment. In typical industrial production scenarios, robots are often required to be re-programmed when facing a more demanding task or even a few changes in workspace conditions. To increase productivity, efficiency and reduce human effort in the design process, this paper explores the potential of using digital twin combined with Reinforcement Learning (RL) to enable robots to generate self-improving collision-free trajectories in real time. The digital twin, acting as a virtual counterpart of the physical system, serves as a 'forward run' for monitoring, controlling, and optimizing the physical system in a safe and cost-effective manner. The physical system sends data to synchronize the digital system through the video feeds from cameras, which allows the virtual robot to update its observation and policy based on real scenarios. The bidirectional communication between digital and physical systems provides a promising platform for hardware-in-the-loop RL training through trial and error until the robot successfully adapts to its new environment. The proposed online training framework is demonstrated on the Unfactory Xarm5 collaborative robot, where the robot end-effector aims to reach the target position while avoiding obstacles. The experiment suggest that proposed framework is capable of performing policy online training, and that there remains significant room for improvement.

翻译：协作机器人的演进与自动化程度提升，使系统面临更复杂的不可预测性，凸显了机器人适应性和灵活性对于应对环境日益复杂化的关键需求。在典型工业制造场景中，当面对更具挑战性的任务或工作空间条件发生细微变化时，机器人往往需要重新编程。为提升生产效率、优化设计流程并减少人工干预，本文探索了利用数字孪生技术与强化学习（RL）相结合的潜力，使机器人能够实时生成自改进的无碰撞轨迹。数字孪生作为物理系统的虚拟映射，通过"前瞻运行"实现对物理系统的安全、经济型监控、控制与优化。物理系统通过摄像头视频流发送数据同步数字系统，使虚拟机器人能够基于真实场景更新其观测与策略。数字-物理系统间的双向通信为基于硬件在环的强化学习训练提供了理想平台，机器人通过反复试错直至成功适应新环境。本文在Unfactory Xarm5协作机器人上验证了所提在线训练框架，使机器人末端执行器在避障过程中精准抵达目标位置。实验表明，该框架具备策略在线训练能力，且仍存在显著改进空间。