Constructing a universal moral code for artificial intelligence (AI) is difficult or even impossible, given that different human cultures have different definitions of morality and different societal norms. We therefore argue that the value system of an AI should be culturally attuned: just as a child raised in a particular culture learns the specific values and norms of that culture, we propose that an AI agent operating in a particular human community should acquire that community's moral, ethical, and cultural codes. How AI systems might acquire such codes from human observation and interaction has remained an open question. Here, we propose using inverse reinforcement learning (IRL) as a method for AI agents to acquire a culturally-attuned value system implicitly. We test our approach using an experimental paradigm in which AI agents use IRL to learn different reward functions, which govern the agents' moral values, by observing the behavior of different cultural groups in an online virtual world requiring real-time decision making. We show that an AI agent learning from the average behavior of a particular cultural group can acquire altruistic characteristics reflective of that group's behavior, and this learned value system can generalize to new scenarios requiring altruistic judgments. Our results provide, to our knowledge, the first demonstration that AI agents could potentially be endowed with the ability to continually learn their values and norms from observing and interacting with humans, thereby becoming attuned to the culture they are operating in.
翻译:构建人工智能的普适道德规范具有困难甚至不可能性,因为不同人类文化对道德的定义和社会准则存在差异。因此我们主张,AI的价值体系应当具有文化调谐性:如同在特定文化中成长的儿童习得该文化的价值观和规范,我们认为在特定人类社群中运作的AI智能体应当习得该社群的道德、伦理和文化规范。AI系统如何通过观察人类行为和互动来获取此类规范,至今仍是一个开放性问题。本文提出采用逆强化学习(IRL)作为AI智能体内隐获取文化调谐价值体系的方法。我们通过实验范式验证该方法:AI智能体在需要实时决策的在线虚拟世界中观察不同文化群体的行为,利用IRL学习支配其道德价值的不同奖励函数。研究表明,学习特定文化群体平均行为的AI智能体能够获得反映该群体行为的利他特征,且习得的价值体系可泛化至需要利他判断的新情境。据我们所知,本研究首次证明AI智能体可能具备通过观察人类行为并与之交互持续学习价值规范的能力,从而调谐至其运作的文化环境。