Achieving safe and coordinated behavior in dynamic, constraint-rich environments remains a major challenge for learning-based control. Pure end-to-end learning often suffers from poor sample efficiency and limited reliability, while model-based methods depend on predefined references and struggle to generalize. We propose a hierarchical framework that combines tactical decision-making via reinforcement learning (RL) with low-level execution through Model Predictive Control (MPC). For the case of multi-agent systems this means that high-level policies select abstract targets from structured regions of interest (ROIs), while MPC ensures dynamically feasible and safe motion. Tested on a predator-prey benchmark, our approach outperforms end-to-end and shielding-based RL baselines in terms of reward, safety, and consistency, underscoring the benefits of combining structured learning with model-based control.
翻译:在动态且约束丰富的环境中实现安全协调行为,仍然是基于学习的控制方法面临的主要挑战。纯粹的端到端学习方法通常存在样本效率低和可靠性有限的问题,而基于模型的方法依赖于预定义的参考轨迹,且难以泛化。我们提出了一种分层框架,该框架通过强化学习进行战术决策,并通过模型预测控制执行低层控制。对于多智能体系统而言,这意味着高层策略从结构化的感兴趣区域中选择抽象目标,而模型预测控制则确保动态可行且安全的运动。在捕食者-猎物基准测试中,我们的方法在奖励、安全性和一致性方面均优于端到端和基于屏蔽的强化学习基线,这突显了将结构化学习与基于模型的控制相结合的优势。