When independently trained or designed robots are deployed in a shared environment, their combined actions can lead to unintended negative side effects (NSEs). To ensure safe and efficient operation, robots must optimize task performance while minimizing the penalties associated with NSEs, balancing individual objectives with collective impact. We model the problem of mitigating NSEs in a cooperative multi-agent system as a bi-objective lexicographic decentralized Markov decision process. We assume independence of transitions and rewards with respect to the robots' tasks, but the joint NSE penalty creates a form of dependence in this setting. To improve scalability, the joint NSE penalty is decomposed into individual penalties for each robot using credit assignment, which facilitates decentralized policy computation. We empirically demonstrate, using mobile robots and in simulation, the effectiveness and scalability of our approach in mitigating NSEs.
翻译:当独立训练或设计的机器人在共享环境中部署时,其联合行动可能导致非预期的负面副作用。为确保安全高效运行,机器人在优化任务性能的同时,必须最小化与负面副作用相关的惩罚,从而平衡个体目标与集体影响。我们将合作型多智能体系统中缓解负面副作用的问题建模为双目标字典序分散式马尔可夫决策过程。我们假设状态转移与奖励函数在机器人任务层面具有独立性,但联合负面副作用惩罚在此设定中形成了某种形式的依赖关系。为提升可扩展性,我们通过信用分配将联合负面副作用惩罚分解为各机器人的个体惩罚,从而促进分散式策略计算。我们通过移动机器人实验与仿真验证了所提方法在缓解负面副作用方面的有效性与可扩展性。