The development of ethical AI systems is currently geared toward setting objective functions that align with human objectives. However, finding such functions remains a research challenge, while in RL, setting rewards by hand is a fairly standard approach. We present a methodology for dynamic value alignment, where the values that are to be aligned with are dynamically changing, using a multiple-objective approach. We apply this approach to extend Deep $Q$-Learning to accommodate multiple objectives and evaluate this method on a simplified two-leg intersection controlled by a switching agent.Our approach dynamically accommodates the preferences of drivers on the system and achieves better overall performance across three metrics (speeds, stops, and waits) while integrating objectives that have competing or conflicting actions.
翻译:伦理AI系统的开发目前主要致力于设定与人类目标相一致的目标函数。然而,寻找此类函数仍是一项研究挑战,而在强化学习中,手动设定奖励是一种相当标准的方法。我们提出了一种动态价值对齐的方法论,其中需要对齐的价值是动态变化的,并采用多目标方法实现。我们将该方法应用于扩展深度$Q$-学习以兼容多个目标,并在一个由切换智能体控制的简化双路口场景上评估该方法的性能。我们的方法能够动态适应系统中驾驶员的偏好,在三个指标(速度、停车次数和等待时间)上实现更优的整体表现,同时整合了存在竞争或冲突行为的目标。