One of today's most significant societal challenges is building AI systems whose behaviour, or the behaviour it enables within communities of interacting agents (human and artificial), aligns with human values. To address this challenge, we detail a formal model of human values for their explicit computational representation. To our knowledge, this has not been attempted as yet, which is surprising given the growing volume of research integrating values within AI. Taking as our starting point the wealth of research investigating the nature of human values from social psychology over the last few decades, we set out to provide such a formal model. We show how this model can provide the foundational apparatus for AI-based reasoning over values, and demonstrate its applicability in real-world use cases. We illustrate how our model captures the key ideas from social psychology research and propose a roadmap for future integrated, and interdisciplinary, research into human values in AI. The ability to automatically reason over values not only helps address the value alignment problem but also facilitates the design of AI systems that can support individuals and communities in making more informed, value-aligned decisions. More and more, individuals and organisations are motivated to understand their values more explicitly and explore whether their behaviours and attitudes properly reflect them. Our work on modelling human values will enable AI systems to be designed and deployed to meet this growing need.
翻译:当今社会面临的最重大挑战之一,是构建其行为(或其在智能体(人类与人工)互动社区中引发的行为)与人类价值观相一致的AI系统。为应对这一挑战,我们详细阐述了一种用于显式计算表征的人类价值观形式化模型。据我们所知,目前尚未有此类尝试,考虑到将价值观融入AI的研究日益增多,这一现象令人惊讶。以过去几十年社会心理学领域关于人类价值观本质的丰富研究为起点,我们着手提供这样一种形式化模型。我们展示了该模型如何为基于AI的价值观推理提供基础工具,并论证了其在真实场景中的适用性。我们阐释了模型如何捕捉社会心理学研究中的核心思想,并提出了未来将人类价值观融入AI的综合跨学科研究路线图。自动化价值观推理能力不仅有助于解决价值对齐问题,还能促进设计出支持个体与社群做出更明智、更符合价值观的决策的AI系统。越来越多个体与组织希望更清晰地了解自身价值观,并探究自身行为与态度是否真正反映这些价值观。我们在人类价值观建模方面的工作,将使得AI系统能够被设计并部署,以满足这一日益增长的需求。