In the diverse array of work investigating the nature of human values from psychology, philosophy and social sciences, there is a clear consensus that values guide behaviour. More recently, a recognition that values provide a means to engineer ethical AI has emerged. Indeed, Stuart Russell proposed shifting AI's focus away from simply ``intelligence'' towards intelligence ``provably aligned with human values''. This challenge -- the value alignment problem -- with others including an AI's learning of human values, aggregating individual values to groups, and designing computational mechanisms to reason over values, has energised a sustained research effort. Despite this, no formal, computational definition of values has yet been proposed. We address this through a formal conceptual framework rooted in the social sciences, that provides a foundation for the systematic, integrated and interdisciplinary investigation into how human values can support designing ethical AI.
翻译:在心理学、哲学和社会科学对人类价值观本质的广泛研究中,一个明确的共识是:价值观指导行为。最近,人们逐渐认识到价值观为构建符合伦理的人工智能提供了一种途径。事实上,斯图尔特·罗素曾提出,应将人工智能的关注点从单纯的“智能”转向“可证明与人类价值观一致”的智能。这一挑战——即价值对齐问题——连同其他问题,包括人工智能对人类价值观的学习、将个体价值观聚合为群体价值观,以及设计计算机制以对价值观进行推理,已经推动了一项持续的研究努力。尽管如此,目前尚未提出一个形式化的、可计算的价值定义。我们通过一个植根于社会科学的形式化概念框架来解决这一问题,该框架为系统化、集成化和跨学科地研究人类价值观如何支持设计符合伦理的人工智能奠定了基础。