In the diverse array of work investigating the nature of human values from psychology, philosophy and social sciences, there is a clear consensus that values guide behaviour. More recently, a recognition that values provide a means to engineer ethical AI has emerged. Indeed, Stuart Russell proposed shifting AI's focus away from simply ``intelligence'' towards intelligence ``provably aligned with human values''. This challenge -- the value alignment problem -- with others including an AI's learning of human values, aggregating individual values to groups, and designing computational mechanisms to reason over values, has energised a sustained research effort. Despite this, no formal, computational definition of values has yet been proposed. We address this through a formal conceptual framework rooted in the social sciences, that provides a foundation for the systematic, integrated and interdisciplinary investigation into how human values can support designing ethical AI.
翻译:在心理学、哲学和社会科学领域对人类价值观本质的多样化研究工作中,存在一个明确共识:价值观引导行为。近年来,人们逐渐认识到价值观为设计伦理人工智能提供了重要途径。事实上,斯图尔特·拉塞尔提出应将人工智能的研究重心从单纯的"智能"转向"与人类价值可证明一致"的智能。这一挑战——价值对齐问题——连同人工智能学习人类价值观、将个体价值观聚合为群体价值观、以及设计基于价值观推理的计算机制等问题,持续推动着相关研究。尽管如此,目前尚未提出正式的计算意义上的价值观定义。我们通过一个根植于社会科学的形式化概念框架来应对这一挑战,该框架为系统化、跨学科地研究人类价值观如何支持伦理人工智能设计提供了基础。