Reinforcement Learning-based Switching Controller for a Milliscale Robot in a Constrained Environment

This work presents a reinforcement learning-based switching control mechanism to autonomously move a ferromagnetic object (representing a milliscale robot) around obstacles within a constrained environment in the presence of disturbances. This mechanism can be used to navigate objects (e.g., capsule endoscopy, swarms of drug particles) through complex environments when active control is a necessity but where direct manipulation can be hazardous. The proposed control scheme consists of a switching control architecture implemented by two sub-controllers. The first sub-controller is designed to employ the robot's inverse kinematic solutions to do an environment search for the to-be-carried ferromagnetic particle while being robust to disturbances. The second sub-controller uses a customized rainbow algorithm to control a robotic arm, i.e., the UR5 robot, to carry a ferromagnetic particle to a desired position through a constrained environment. For the customized Rainbow algorithm, Quantile Huber loss from the Implicit Quantile Networks (IQN) algorithm and ResNet are employed. The proposed controller is first trained and tested in a real-time physics simulation engine (PyBullet). Afterward, the trained controller is transferred to a UR5 robot to remotely transport a ferromagnetic particle in a real-world scenario, achieving a 98.86% success rate over 30 episodes for randomly generated trajectories, demonstrating the viability of the proposed approach for real-life applications. In addition, two classical pathfinding approaches, Attractor Dynamics and the execution extended Rapidly-Exploring Random Trees (ERRT), are also investigated and compared to the RL-based method. The proposed RL-based algorithm is shown to achieve performance comparable to that of the tested classical path planners whilst being more robust to deploy in dynamical environments.

翻译：本工作提出了一种基于强化学习的切换控制机制，用于在有扰动存在的受限环境中自主驱动铁磁物体（代表毫尺度机器人）绕开障碍物运动。该机制可在需要主动控制但直接操控具有危险性的场景中，引导物体（如胶囊内窥镜、药物粒子群）穿越复杂环境。所提出的控制方案由两个子控制器构成的切换控制架构实现。第一个子控制器利用机器人的逆运动学解，在具备扰动鲁棒性的条件下搜索待携带的铁磁粒子所处环境。第二个子控制器采用定制化Rainbow算法控制机械臂（即UR5机器人），使其在受限环境中将铁磁粒子携带至目标位置。在定制化Rainbow算法中，采用了隐式分位数网络(IQN)算法的分位数Huber损失与残差网络(ResNet)。所提控制器首先在实时物理仿真引擎(PyBullet)中完成训练与测试，随后将训练完成的控制器迁移至UR5机器人，在真实场景中远程传输铁磁粒子。在30轮随机生成轨迹的实验中实现了98.86%的成功率，验证了该方法在现实应用中的可行性。此外，本研究还探讨了两种经典路径规划方法——吸引子动力学与执行扩展快速搜索随机树(ERRT)，并与基于强化学习的方法进行对比。实验表明，所提出的基于强化学习的算法在保持与经典路径规划器相当性能的同时，在动态环境部署中展现出更强的鲁棒性。