The numerous deployed Artificial Intelligence systems need to be aligned with our ethical considerations. However, such ethical considerations might change as time passes: our society is not fixed, and our social mores evolve. This makes it difficult for these AI systems; in the Machine Ethics field especially, it has remained an under-studied challenge. In this paper, we present two algorithms, named QSOM and QDSOM, which are able to adapt to changes in the environment, and especially in the reward function, which represents the ethical considerations that we want these systems to be aligned with. They associate the well-known Q-Table to (Dynamic) Self-Organizing Maps to handle the continuous and multi-dimensional state and action spaces. We evaluate them on a use-case of multi-agent energy repartition within a small Smart Grid neighborhood, and prove their ability to adapt, and their higher performance compared to baseline Reinforcement Learning algorithms.
翻译:大量部署的人工智能系统需要与我们的伦理考量保持一致。然而,此类伦理考量可能随时间推移而变化:我们的社会并非一成不变,社会道德规范也在演化。这对这些人工智能系统构成了挑战;尤其在机器伦理学领域,这一问题始终是研究不足的难点。本文提出了两种算法——QSOM与QDSOM——它们能够适应环境变化,尤其是代表我们期望系统与之对齐的伦理考量的奖励函数变化。这两种算法将著名的Q表与(动态)自组织映射相结合,以处理连续且多维的状态与动作空间。我们在一个小型智能电网邻域的多智能体能量再分配用例上对其进行了评估,证明了它们的适应能力,以及相较于基线强化学习算法的更优性能。