As artificial intelligence (AI) systems become increasingly integrated into various domains, ensuring that they align with human values becomes critical. This paper introduces a novel formalism to quantify the alignment between AI systems and human values, using Markov Decision Processes (MDPs) as the foundational model. We delve into the concept of values as desirable goals tied to actions and norms as behavioral guidelines, aiming to shed light on how they can be used to guide AI decisions. This framework offers a mechanism to evaluate the degree of alignment between norms and values by assessing preference changes across state transitions in a normative world. By utilizing this formalism, AI developers and ethicists can better design and evaluate AI systems to ensure they operate in harmony with human values. The proposed methodology holds potential for a wide range of applications, from recommendation systems emphasizing well-being to autonomous vehicles prioritizing safety.
翻译:随着人工智能系统日益融入各个领域,确保其与人类价值观保持一致变得至关重要。本文提出一种新颖的形式化方法,以马尔可夫决策过程为基础模型,量化人工智能系统与人类价值观之间的对齐程度。我们深入探讨价值观作为与行动相关的理想目标,以及规范作为行为准则的概念,旨在阐明如何利用它们指导人工智能决策。该框架通过评估规范世界中状态转换间的偏好变化,提供了一种衡量规范与价值观对齐程度的机制。借助这一形式化方法,人工智能开发者和伦理学家可以更好地设计并评估人工智能系统,确保其与人类价值观和谐运作。所提出的方法论在从强调福祉的推荐系统到优先考虑安全的自动驾驶汽车等广泛应用中具有潜力。