Positive Alignment: Artificial Intelligence for Human Flourishing

Ruben Laukkonen,Seb Krier,Chloé Bakalar,Shamil Chandaria,Morten Kringelbach,Adam Elwood,Daniel Ford,Fernando Rosas,Maty Bohacek,Matija Franklin,Nenad Tomašev,Stephanie Chan,Verena Rieser,Roma Patel,Michael Levin,Arun Rao

Existing alignment research is dominated by concerns about safety and preventing harm: safeguards, controllability, and compliance. This paradigm of alignment parallels early psychology's focus on mental illness: necessary but incomplete. What we call Positive Alignment is the development of AI systems that (i) actively support human and ecological flourishing in a pluralistic, polycentric, context-sensitive, and user-authored way while (ii) remaining safe and cooperative. It is a distinct and necessary agenda within AI alignment research. We argue that several existing failures of alignment (e.g., engagement hacking, loss of human autonomy, failures in truth-seeking, low epistemic humility, error correction, lack of diverse viewpoints, and being primarily reactive rather than proactive) may be better addressed through positive alignment, including cultivating virtues and maximizing human flourishing. We highlight a range of challenges, open questions, and technical directions (e.g., data filtering and upsampling, pre- and post-training, evaluations, collaborative value collection) for different phases of the LLM and agents lifecycle. We end with design principles for promoting disagreement and decentralization through contextual grounding, community customization, continual adaptation, and polycentric governance; that is, many legitimate centers of oversight rather than one institutional or moral chokepoint.

翻译：现有对齐研究主要聚焦于安全与风险防范议题：防护措施、可控性及合规性。这种对齐范式与早期心理学专注于精神疾病的路径类似——虽属必要，却存在根本性缺失。我们提出的"正向对齐"旨在发展兼具以下特征的AI系统：（i）以多元主义、多中心、情境敏感且用户自主的方式积极支持人类与生态繁荣，（ii）同时保持安全性与协作性。这是AI对齐研究中独立且必要的议程。我们认为现有对齐的若干失效现象（例如：互动机制操纵、人类自主性丧失、真理性探寻不足、认知谦逊缺失、纠错机制薄弱、视角多样性匮乏、以及主要呈现被动响应而非主动预判）可能通过正向对齐得到更好解决，包括培育美德与最大化人类繁荣。我们针对大语言模型及智能体的生命周期各阶段，系统梳理了多重挑战、开放性议题与技术方向（例如：数据筛选与重采样、预训练与后训练、评估机制、协作性价值采集）。最终提出的设计原则通过情境锚定、社群定制、持续适应与多中心治理来促进分歧与去中心化——即建立多元合法的监督中心，而非依赖单一机构或道德瓶颈。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【NeurIPS2025教程】人类–AI 对齐：基础、方法、实践与挑战

专知会员服务

26+阅读 · 2025年12月7日