Existing alignment research is dominated by concerns about safety and preventing harm: safeguards, controllability, and compliance. This paradigm of alignment parallels early psychology's focus on mental illness: necessary but incomplete. What we call Positive Alignment is the development of AI systems that (i) actively support human and ecological flourishing in a pluralistic, polycentric, context-sensitive, and user-authored way while (ii) remaining safe and cooperative. It is a distinct and necessary agenda within AI alignment research. We argue that several existing failures of alignment (e.g., engagement hacking, loss of human autonomy, failures in truth-seeking, low epistemic humility, error correction, lack of diverse viewpoints, and being primarily reactive rather than proactive) may be better addressed through positive alignment, including cultivating virtues and maximizing human flourishing. We highlight a range of challenges, open questions, and technical directions (e.g., data filtering and upsampling, pre- and post-training, evaluations, collaborative value collection) for different phases of the LLM and agents lifecycle. We end with design principles for promoting disagreement and decentralization through contextual grounding, community customization, continual adaptation, and polycentric governance; that is, many legitimate centers of oversight rather than one institutional or moral chokepoint.
翻译:现有对齐研究主要关注安全与防止伤害:防护措施、可控性和合规性。这种对齐范式类似于早期心理学对精神疾病的关注——必要但不完整。我们将"正面对齐"定义为发展人工智能系统,这些系统(i)以多元化、多中心、情境敏感和用户主导的方式积极支持人类与生态繁荣,同时(ii)保持安全与合作。这是人工智能对齐研究中独特且必要的议程。我们认为,现有的若干对齐失败(例如参与度劫持、人类自主性丧失、求真失败、认知谦逊不足、纠错机制缺失、缺乏多元视角,以及以被动响应而非主动预防为主导),可能通过正面对齐——包括培养美德和最大化人类繁荣——得到更好的解决。我们针对大型语言模型及智能体生命周期的不同阶段,突显了一系列挑战、开放性问题和技术方向(如数据过滤与上采样、预训练与后训练、评估、协作式价值收集)。最后,我们提出了通过情境锚定、社区定制、持续适应与多中心治理来促进异议与去中心化的设计原则——即建立多个合法的监督中心,而非单一机构或道德瓶颈。