Dynamic Normativity: Necessary and Sufficient Conditions for Value Alignment

The critical inquiry pervading the realm of Philosophy, and perhaps extending its influence across all Humanities disciplines, revolves around the intricacies of morality and normativity. Surprisingly, in recent years, this thematic thread has woven its way into an unexpected domain, one not conventionally associated with pondering "what ought to be": the field of artificial intelligence (AI) research. Central to morality and AI, we find "alignment", a problem related to the challenges of expressing human goals and values in a manner that artificial systems can follow without leading to unwanted adversarial effects. More explicitly and with our current paradigm of AI development in mind, we can think of alignment as teaching human values to non-anthropomorphic entities trained through opaque, gradient-based learning techniques. This work addresses alignment as a technical-philosophical problem that requires solid philosophical foundations and practical implementations that bring normative theory to AI system development. To accomplish this, we propose two sets of necessary and sufficient conditions that, we argue, should be considered in any alignment process. While necessary conditions serve as metaphysical and metaethical roots that pertain to the permissibility of alignment, sufficient conditions establish a blueprint for aligning AI systems under a learning-based paradigm. After laying such foundations, we present implementations of this approach by using state-of-the-art techniques and methods for aligning general-purpose language systems. We call this framework Dynamic Normativity. Its central thesis is that any alignment process under a learning paradigm that cannot fulfill its necessary and sufficient conditions will fail in producing aligned systems.

翻译：贯穿哲学领域、并可能延伸至所有人文学科的核心探究，始终围绕着道德与规范性的复杂本质。令人惊讶的是，近年来，这一主题线索已悄然渗入一个出人意料的领域——一个传统上并不以思考"应然"问题著称的领域：人工智能研究。在道德与人工智能的交汇处，"对齐"问题居于核心地位，它涉及如何以人工智能系统可遵循且不引发有害对抗效应的方式，表达人类目标与价值。更明确地说，结合当前人工智能发展范式，我们可以将对齐理解为：向通过不透明的梯度学习技术训练的非拟人化实体传授人类价值观。本研究将对齐视为一个技术-哲学问题，它既需要坚实的哲学基础，又需要能将规范性理论融入人工智能系统开发的实践方案。为此，我们提出两组必要与充分条件，并论证任何对齐过程都应予以考量。必要条件作为形而上与元伦理根基，关乎对齐本身的正当性；而充分条件则为基于学习范式的人工智能系统对齐提供了实施蓝图。在奠定这些基础后，我们运用前沿技术方法，通过对通用语言系统的对齐实践来展示该框架的应用。我们将此框架称为"动态规范性"。其核心论点是：任何无法满足其必要与充分条件的学习范式对齐过程，终将无法产生真正对齐的系统。