Reinforcement learning (RL) has become a foundational approach for enabling intelligent robotic behavior in dynamic and uncertain environments. This work presents an in-depth review of RL principles, advanced deep reinforcement learning (DRL) algorithms, and their integration into robotic and control systems. Beginning with the formalism of Markov Decision Processes (MDPs), the study outlines essential elements of the agent-environment interaction and explores core algorithmic strategies including actor-critic methods, value-based learning, and policy gradients. Emphasis is placed on modern DRL techniques such as DDPG, TD3, PPO, and SAC, which have shown promise in solving high-dimensional, continuous control tasks. A structured taxonomy is introduced to categorize RL applications across domains such as locomotion, manipulation, multi-agent coordination, and human-robot interaction, along with training methodologies and deployment readiness levels. The review synthesizes recent research efforts, highlighting technical trends, design patterns, and the growing maturity of RL in real-world robotics. Overall, this work aims to bridge theoretical advances with practical implementations, providing a consolidated perspective on the evolving role of RL in autonomous robotic systems.
翻译:强化学习已成为在动态和不确定环境中实现智能机器人行为的基础性方法。本文深入综述了强化学习原理、先进的深度强化学习算法及其在机器人与控制系统中的集成。从马尔可夫决策过程的数学形式化出发,本研究概述了智能体-环境交互的基本要素,并探讨了包括演员-评论家方法、基于值的学习和策略梯度在内的核心算法策略。重点介绍了DDPG、TD3、PPO和SAC等现代深度强化学习技术,这些技术在解决高维连续控制任务中展现出潜力。本文提出了结构化分类法,用于对运动控制、操作任务、多智能体协调和人机交互等领域的强化学习应用进行分类,同时涵盖训练方法和部署就绪度评估。本综述综合了近期研究成果,突出了技术趋势、设计模式以及强化学习在实际机器人应用中日益成熟的态势。总体而言,本研究旨在连接理论进展与实际应用,为强化学习在自主机器人系统中不断演变的角色提供整合性视角。