Recent research has demonstrated the potential of reinforcement learning (RL) in enabling effective multi-robot collaboration, particularly in social dilemmas where robots face a trade-off between self-interests and collective benefits. However, environmental factors such as miscommunication and adversarial robots can impact cooperation, making it crucial to explore how multi-robot communication can be manipulated to achieve different outcomes. This paper presents a novel approach, namely PIMbot, to manipulating the reward function in multi-robot collaboration through two distinct forms of manipulation: policy and incentive manipulation. Our work introduces a new angle for manipulation in recent multi-agent RL social dilemmas that utilize a unique reward function for incentivization. By utilizing our proposed PIMbot mechanisms, a robot is able to manipulate the social dilemma environment effectively. PIMbot has the potential for both positive and negative impacts on the task outcome, where positive impacts lead to faster convergence to the global optimum and maximized rewards for any chosen robot. Conversely, negative impacts can have a detrimental effect on the overall task performance. We present comprehensive experimental results that demonstrate the effectiveness of our proposed methods in the Gazebo-simulated multi-robot environment. Our work provides insights into how inter-robot communication can be manipulated and has implications for various robotic applications. %, including robotics, transportation, and manufacturing.
翻译:近期研究已证明强化学习在多机器人协作中的潜力,尤其是在机器人面临自身利益与集体利益权衡的社会困境中。然而,沟通误差和对抗性机器人等环境因素会影响协作,因此探索如何操纵多机器人通信以实现不同结果至关重要。本文提出了一种新颖方法——PIMbot,通过两种不同的操纵形式(策略操纵与激励操纵)来干预多机器人协作中的奖励函数。我们的工作为近期采用独特奖励函数进行激励的多智能体强化学习社会困境提供了新的操纵视角。通过利用所提出的PIMbot机制,机器人能够有效操纵社会困境环境。PIMbot对任务结果可能产生正面或负面影响:正面影响可促使任意选定机器人更快收敛至全局最优解并最大化奖励,而负面影响则会损害整体任务性能。我们在Gazebo仿真的多机器人环境中进行了全面的实验,结果证明了所提方法的有效性。本研究揭示了机器人间通信的操纵机制,对包括机器人技术、交通运输和制造业在内的各类机器人应用具有启示意义。