ManeuverNet：基于优化奖励函数的软演员-评论家框架及其在双阿克曼转向机器人精确操控中的应用 (ManeuverNet: A Soft Actor-Critic Framework for Precise Maneuvering of Double-Ackermann-Steering Robots with Optimized Reward Functions)

Autonomous control of double-Ackermann-steering robots is essential in agricultural applications, where robots must execute precise and complex maneuvers within a limited space. Classical methods, such as the Timed Elastic Band (TEB) planner, can address this problem, but they rely on parameter tuning, making them highly sensitive to changes in robot configuration or environment and impractical to deploy without constant recalibration. At the same time, end-to-end deep reinforcement learning (DRL) methods often fail due to unsuitable reward functions for non-holonomic constraints, resulting in sub-optimal policies and poor generalization. To address these challenges, this paper presents ManeuverNet, a DRL framework tailored for double-Ackermann systems, combining Soft Actor-Critic with CrossQ. Furthermore, ManeuverNet introduces four specifically designed reward functions to support maneuver learning. Unlike prior work, ManeuverNet does not depend on expert data or handcrafted guidance. We extensively evaluate ManeuverNet against both state-of-the-art DRL baselines and the TEB planner. Experimental results demonstrate that our framework substantially improves maneuverability and success rates, achieving more than a 40% gain over DRL baselines. Moreover, ManeuverNet effectively mitigates the strong parameter sensitivity observed in the TEB planner. In real-world trials, ManeuverNet achieved up to a 90% increase in maneuvering trajectory efficiency, highlighting its robustness and practical applicability.

翻译：双阿克曼转向机器人的自主控制在农业应用中至关重要，此类机器人需在有限空间内执行精确复杂的机动动作。经典方法（如Timed Elastic Band规划器）虽能处理该问题，但其依赖参数调优，导致对机器人构型或环境变化极为敏感，且需持续重新标定方可部署。与此同时，端到端深度强化学习方法常因奖励函数不适用于非完整约束而失效，导致策略次优且泛化能力差。为应对这些挑战，本文提出ManeuverNet——一个专为双阿克曼系统设计的深度强化学习框架，其融合了Soft Actor-Critic与CrossQ算法。此外，ManeuverNet引入了四种专门设计的奖励函数以支持机动学习。与现有研究不同，ManeuverNet不依赖专家数据或人工引导规则。我们通过大量实验将ManeuverNet与前沿深度强化学习基准方法及TEB规划器进行对比评估。实验结果表明，本框架显著提升了机动性能与任务成功率，较深度强化学习基准方法获得超过40%的性能增益。同时，ManeuverNet有效缓解了TEB规划器表现出的强参数敏感性。在实际场景测试中，ManeuverNet实现了高达90%的机动轨迹效率提升，彰显了其鲁棒性与实际应用价值。