Positive Alignment: Artificial Intelligence for Human Flourishing

Ruben Laukkonen,Seb Krier,Chloé Bakalar,Shamil Chandaria,Morten Kringelbach,Adam Elwood,Daniel Ford,Fernando Rosas,Maty Bohacek,Matija Franklin,Nenad Tomašev,Stephanie Chan,Verena Rieser,Roma Patel,Michael Levin,Arun Rao

Existing alignment research is dominated by concerns about safety and preventing harm: safeguards, controllability, and compliance. This paradigm of alignment parallels early psychology's focus on mental illness: necessary but incomplete. What we call Positive Alignment is the development of AI systems that (i) actively support human and ecological flourishing in a pluralistic, polycentric, context-sensitive, and user-authored way while (ii) remaining safe and cooperative. It is a distinct and necessary agenda within AI alignment research. We argue that several existing failures of alignment (e.g., engagement hacking, loss of human autonomy, failures in truth-seeking, low epistemic humility, error correction, lack of diverse viewpoints, and being primarily reactive rather than proactive) may be better addressed through positive alignment, including cultivating virtues and maximizing human flourishing. We highlight a range of challenges, open questions, and technical directions (e.g., data filtering and upsampling, pre- and post-training, evaluations, collaborative value collection) for different phases of the LLM and agents lifecycle. We end with design principles for promoting disagreement and decentralization through contextual grounding, community customization, continual adaptation, and polycentric governance; that is, many legitimate centers of oversight rather than one institutional or moral chokepoint.

翻译：现有对齐研究主要关注安全与防止伤害：防护措施、可控性和合规性。这种对齐范式类似于早期心理学对精神疾病的关注——必要但不完整。我们将"正面对齐"定义为发展人工智能系统，这些系统(i)以多元化、多中心、情境敏感和用户主导的方式积极支持人类与生态繁荣，同时(ii)保持安全与合作。这是人工智能对齐研究中独特且必要的议程。我们认为，现有的若干对齐失败（例如参与度劫持、人类自主性丧失、求真失败、认知谦逊不足、纠错机制缺失、缺乏多元视角，以及以被动响应而非主动预防为主导），可能通过正面对齐——包括培养美德和最大化人类繁荣——得到更好的解决。我们针对大型语言模型及智能体生命周期的不同阶段，突显了一系列挑战、开放性问题和技术方向（如数据过滤与上采样、预训练与后训练、评估、协作式价值收集）。最后，我们提出了通过情境锚定、社区定制、持续适应与多中心治理来促进异议与去中心化的设计原则——即建立多个合法的监督中心，而非单一机构或道德瓶颈。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

人工智能解释公平性：统一框架、公理与负责任AI的未来方向

专知会员服务

13+阅读 · 5月12日

【NeurIPS2025教程】人类–AI 对齐：基础、方法、实践与挑战

专知会员服务

26+阅读 · 2025年12月7日