Offloading computationally expensive algorithms to the edge or even cloud offers an attractive option to tackle limitations regarding on-board computational and energy resources of robotic systems. In cloud-native applications deployed with the container management system Kubernetes (K8s), one key problem is ensuring resilience against various types of failures. However, complex robotic systems interacting with the physical world pose a very specific set of challenges and requirements that are not yet covered by failure mitigation approaches from the cloud-native domain. In this paper, we therefore propose a novel approach for robotic system monitoring and stateful, reactive failure mitigation for distributed robotic systems deployed using Kubernetes (K8s) and the Robot Operating System (ROS2). By employing the generic substrate of Behaviour Trees, our approach can be applied to any robotic workload and supports arbitrarily complex monitoring and failure mitigation strategies. We demonstrate the effectiveness and application-agnosticism of our approach on two example applications, namely Autonomous Mobile Robot (AMR) navigation and robotic manipulation in a simulated environment.
翻译:将计算密集型算法卸载至边缘甚至云端,为解决机器人系统机载计算与能源资源限制提供了颇具吸引力的方案。在采用容器管理系统Kubernetes(K8s)部署的云原生应用中,确保系统对各类故障的弹性是一个关键问题。然而,与物理世界交互的复杂机器人系统带来了一系列尚未被云原生领域故障缓解方法所涵盖的特定挑战与需求。为此,本文提出一种新颖的分布式机器人系统监控与有状态反应式故障缓解方法,适用于基于Kubernetes(K8s)与机器人操作系统(ROS2)部署的系统。通过采用行为树的通用框架,本方法可应用于任意机器人工作负载,并支持任意复杂的监控与故障缓解策略。我们在两个示例应用(即自主移动机器人导航与模拟环境中的机器人操控)中验证了本方法的有效性与应用无关性。