This study presents a comparative analysis between single-objective and multi-objective reinforcement learning methods for training a robot to navigate effectively to an end goal while efficiently avoiding obstacles. Traditional reinforcement learning techniques, namely Deep Q-Network (DQN), Deep Deterministic Policy Gradient (DDPG), and Twin Delayed DDPG (TD3), have been evaluated using the Gazebo simulation framework in a variety of environments with parameters such as random goal and robot starting locations. These methods provide a numerical reward to the robot, offering an indication of action quality in relation to the goal. However, their limitations become apparent in complex settings where multiple, potentially conflicting, objectives are present. To address these limitations, we propose an approach employing Multi-Objective Reinforcement Learning (MORL). By modifying the reward function to return a vector of rewards, each pertaining to a distinct objective, the robot learns a policy that effectively balances the different goals, aiming to achieve a Pareto optimal solution. This comparative study highlights the potential for MORL in complex, dynamic robotic navigation tasks, setting the stage for future investigations into more adaptable and robust robotic behaviors.
翻译:本研究对单目标和多目标强化学习方法进行了比较分析,旨在训练机器人有效导航至终点目标同时高效规避障碍。传统强化学习技术——深度Q网络(DQN)、深度确定性策略梯度(DDPG)及其改进型双延迟DDPG(TD3)——在多种环境中基于Gazebo仿真框架得到评估,其中目标位置与机器人起始位置等参数随机设定。这些方法通过数值奖励信号为机器人提供动作质量指标,以衡量其与目标的相关性。然而,在面对具有多重潜在冲突目标的复杂场景时,其局限性凸显。为解决此问题,我们提出采用多目标强化学习(MORL)的方法:通过修改奖励函数使其返回向量化奖励(每个分量对应特定目标),机器人可习得有效平衡不同目标的策略,旨在获得帕累托最优解。本比较研究揭示了MORL在复杂动态机器人导航任务中的潜力,为探索更具适应性和鲁棒性的机器人行为奠定了基础。