自适应强化学习与模型预测控制切换用于安全人机协同导航 (Adaptive Reinforcement and Model Predictive Control Switching for Safe Human-Robot Cooperative Navigation)

This paper addresses the challenge of human-guided navigation for mobile collaborative robots under simultaneous proximity regulation and safety constraints. We introduce Adaptive Reinforcement and Model Predictive Control Switching (ARMS), a hybrid learning-control framework that integrates a reinforcement learning follower trained with Proximal Policy Optimization (PPO) and an analytical one-step Model Predictive Control (MPC) formulated as a quadratic program safety filter. To enable robust perception under partial observability and non-stationary human motion, ARMS employs a decoupled sensing architecture with a Long Short-Term Memory (LSTM) temporal encoder for the human-robot relative state and a spatial encoder for 360-degree LiDAR scans. The core contribution is a learned adaptive neural switcher that performs context-aware soft action fusion between the two controllers, favoring conservative, constraint-aware QP-based control in low-risk regions while progressively shifting control authority to the learned follower in highly cluttered or constrained scenarios where maneuverability is critical, and reverting to the follower action when the QP becomes infeasible. Extensive evaluations against Pure Pursuit, Dynamic Window Approach (DWA), and an RL-only baseline demonstrate that ARMS achieves an 82.5 percent success rate in highly cluttered environments, outperforming DWA and RL-only approaches by 7.1 percent and 3.1 percent, respectively, while reducing average computational latency by 33 percent to 5.2 milliseconds compared to a multi-step MPC baseline. Additional simulation transfer in Gazebo and initial real-world deployment results further indicate the practicality and robustness of ARMS for safe and efficient human-robot collaboration. Source code and a demonstration video are available at https://github.com/21ning/ARMS.git.

翻译：本文研究了在同时满足邻近度调节与安全约束条件下，移动协作机器人的人导导航问题。我们提出了自适应强化学习与模型预测控制切换框架，这是一种混合学习-控制框架，集成了通过近端策略优化训练的强化学习跟随器，以及一个被构建为二次规划安全滤波器的解析式单步模型预测控制器。为了在部分可观测性和非平稳人体运动条件下实现鲁棒感知，ARMS采用解耦感知架构：利用长短期记忆时序编码器处理人机相对状态，并采用空间编码器处理360度激光雷达扫描数据。核心贡献在于一个经过学习的自适应神经切换器，该切换器能在两个控制器之间执行上下文感知的软动作融合：在低风险区域优先采用基于二次规划的保守、约束感知控制；在机动性至关重要的高度杂乱或受限场景中，逐步将控制权转移至学习型跟随器；当二次规划不可行时则切换回跟随器动作。通过对纯追踪法、动态窗口法以及纯强化学习基线的广泛评估表明，ARMS在高度杂乱环境中实现了82.5%的成功率，分别优于动态窗口法和纯强化学习方法7.1%和3.1%，同时相比多步模型预测控制基线将平均计算延迟降低33%至5.2毫秒。在Gazebo中的额外仿真迁移及初步实际部署结果进一步表明，ARMS对于安全高效的人机协作具有实用性与鲁棒性。源代码与演示视频可通过https://github.com/21ning/ARMS.git获取。