Existing alignment research is dominated by concerns about safety and preventing harm: safeguards, controllability, and compliance. This paradigm of alignment parallels early psychology's focus on mental illness: necessary but incomplete. What we call Positive Alignment is the development of AI systems that (i) actively support human and ecological flourishing in a pluralistic, polycentric, context-sensitive, and user-authored way while (ii) remaining safe and cooperative. It is a distinct and necessary agenda within AI alignment research. We argue that several existing failures of alignment (e.g., engagement hacking, loss of human autonomy, failures in truth-seeking, low epistemic humility, error correction, lack of diverse viewpoints, and being primarily reactive rather than proactive) may be better addressed through positive alignment, including cultivating virtues and maximizing human flourishing. We highlight a range of challenges, open questions, and technical directions (e.g., data filtering and upsampling, pre- and post-training, evaluations, collaborative value collection) for different phases of the LLM and agents lifecycle. We end with design principles for promoting disagreement and decentralization through contextual grounding, community customization, continual adaptation, and polycentric governance; that is, many legitimate centers of oversight rather than one institutional or moral chokepoint.
翻译:现有对齐研究主要聚焦于安全与风险防范议题:防护措施、可控性及合规性。这种对齐范式与早期心理学专注于精神疾病的路径类似——虽属必要,却存在根本性缺失。我们提出的"正向对齐"旨在发展兼具以下特征的AI系统:(i)以多元主义、多中心、情境敏感且用户自主的方式积极支持人类与生态繁荣,(ii)同时保持安全性与协作性。这是AI对齐研究中独立且必要的议程。我们认为现有对齐的若干失效现象(例如:互动机制操纵、人类自主性丧失、真理性探寻不足、认知谦逊缺失、纠错机制薄弱、视角多样性匮乏、以及主要呈现被动响应而非主动预判)可能通过正向对齐得到更好解决,包括培育美德与最大化人类繁荣。我们针对大语言模型及智能体的生命周期各阶段,系统梳理了多重挑战、开放性议题与技术方向(例如:数据筛选与重采样、预训练与后训练、评估机制、协作性价值采集)。最终提出的设计原则通过情境锚定、社群定制、持续适应与多中心治理来促进分歧与去中心化——即建立多元合法的监督中心,而非依赖单一机构或道德瓶颈。