When humans and autonomous systems operate together as what we refer to as a hybrid team, we of course wish to ensure the team operates successfully and effectively. We refer to team members as agents. In our proposed framework, we address the case of hybrid teams in which, at any time, only one team member (the control agent) is authorized to act as control for the team. To determine the best selection of a control agent, we propose the addition of an AI manager (via Reinforcement Learning) which learns as an outside observer of the team. The manager learns a model of behavior linking observations of agent performance and the environment/world the team is operating in, and from these observations makes the most desirable selection of a control agent. We restrict the manager task by introducing a set of constraints. The manager constraints indicate acceptable team operation, so a violation occurs if the team enters a condition which is unacceptable and requires manager intervention. To ensure minimal added complexity or potential inefficiency for the team, the manager should attempt to minimize the number of times the team reaches a constraint violation and requires subsequent manager intervention. Therefore our manager is optimizing its selection of authorized agents to boost overall team performance while minimizing the frequency of manager intervention. We demonstrate our manager performance in a simulated driving scenario representing the case of a hybrid team of agents composed of a human driver and autonomous driving system. We perform experiments for our driving scenario with interfering vehicles, indicating the need for collision avoidance and proper speed control. Our results indicate a positive impact of our manager, with some cases resulting in increased team performance up to ~187% that of the best solo agent performance.
翻译:当人类与自主系统作为我们称之为混合团队共同运作时,当然希望确保团队成功且高效地运行。我们将团队成员称为智能体。在我们提出的框架中,我们针对混合团队的情况,其中在任何时刻只有一个团队成员(控制智能体)被授权作为团队的控制者。为了确定控制智能体的最佳选择,我们提出通过强化学习添加一个AI管理器,该管理器作为团队的外部观察者进行学习。管理器学习一个行为模型,将智能体表现和团队运行的环境/世界的观察结果联系起来,并根据这些观察结果做出最期望的控制智能体选择。我们通过引入一组约束来限制管理器的任务。管理器约束指示可接受的团队运行状态,因此如果团队进入不可接受的状态并需要管理器干预,则发生违反约束的情况。为了确保为团队增加最小的复杂性或潜在低效,管理器应尽量减少团队达到约束违规并需要后续管理器干预的次数。因此,我们的管理器正在优化其对授权智能体的选择,以提升整体团队性能,同时最小化管理器干预的频率。我们在一个模拟驾驶场景中展示了管理器的性能,该场景代表由人类驾驶员和自动驾驶系统组成的混合智能体团队。我们针对存在干扰车辆的驾驶场景进行了实验,表明需要避免碰撞和适当的速度控制。我们的结果表明管理器的积极影响,在某些情况下,团队性能提升可达最佳单独智能体性能的约187%。