This study presents a comparative analysis between single-objective and multi-objective reinforcement learning methods for training a robot to navigate effectively to an end goal while efficiently avoiding obstacles. Traditional reinforcement learning techniques, namely Deep Q-Network (DQN), Deep Deterministic Policy Gradient (DDPG), and Twin Delayed DDPG (TD3), have been evaluated using the Gazebo simulation framework in a variety of environments with parameters such as random goal and robot starting locations. These methods provide a numerical reward to the robot, offering an indication of action quality in relation to the goal. However, their limitations become apparent in complex settings where multiple, potentially conflicting, objectives are present. To address these limitations, we propose an approach employing Multi-Objective Reinforcement Learning (MORL). By modifying the reward function to return a vector of rewards, each pertaining to a distinct objective, the robot learns a policy that effectively balances the different goals, aiming to achieve a Pareto optimal solution. This comparative study highlights the potential for MORL in complex, dynamic robotic navigation tasks, setting the stage for future investigations into more adaptable and robust robotic behaviors.
翻译:本研究对单目标和多目标强化学习方法进行了比较分析,旨在训练机器人有效导航至终点并高效避开障碍物。传统强化学习技术,即深度Q网络(DQN)、深度确定性策略梯度(DDPG)和双延迟DDPG(TD3),已在Gazebo仿真框架下基于多种环境(包括随机目标和机器人起始位置等参数)进行了评估。这些方法向机器人提供数值奖励,指示与目标相关的动作质量。然而,在存在多个潜在冲突目标的复杂场景中,其局限性变得明显。为克服这些局限,我们提出了一种采用多目标强化学习(MORL)的方法。通过修改奖励函数以返回一个奖励向量(每个分量对应一个特定目标),机器人学习到一种有效平衡不同目标的策略,旨在达到帕累托最优解。本比较研究凸显了MORL在复杂动态机器人导航任务中的潜力,为未来探索更具适应性和鲁棒性的机器人行为奠定了基础。