High-precision control tasks present substantial challenges for reinforcement learning (RL) algorithms, frequently resulting in suboptimal performance attributed to network approximation inaccuracies and inadequate sample quality.These issues are exacerbated when the task requires the agent to achieve a precise goal state, as is common in robotics and other real-world applications.We introduce Adviser-Actor-Critic (AAC), designed to address the precision control dilemma by combining the precision of feedback control theory with the adaptive learning capability of RL and featuring an Adviser that mentors the actor to refine control actions, thereby enhancing the precision of goal attainment.Finally, through benchmark tests, AAC outperformed standard RL algorithms in precision-critical, goal-conditioned tasks, demonstrating AAC's high precision, reliability, and robustness.Code are available at: https://anonymous.4open.science/r/Adviser-Actor-Critic-8AC5.
翻译:高精度控制任务对强化学习算法提出了重大挑战,常因网络近似误差和样本质量不足导致次优性能。当任务要求智能体达到精确目标状态时(这在机器人学和其他现实应用中很常见),这些问题会进一步加剧。我们提出了指导者-行动者-评论家算法,旨在通过结合反馈控制理论的精确性与强化学习的自适应学习能力来解决精度控制难题,其特点是引入了一个指导者来指导行动者优化控制动作,从而提升目标达成的精度。最终,通过基准测试,AAC在精度关键、目标条件任务中超越了标准强化学习算法,展现了其高精度、可靠性和鲁棒性。代码可在 https://anonymous.4open.science/r/Adviser-Actor-Critic-8AC5 获取。