The challenge of aligning artificial intelligence (AI) with human values persists due to the abstract and often conflicting nature of moral principles and the opacity of existing approaches. This paper introduces CogniAlign, a multi-agent deliberation framework based on naturalistic moral realism, that grounds moral reasoning in survivability, defined across individual and collective dimensions, and operationalizes it through structured deliberations among discipline-specific scientist agents. Each agent, representing neuroscience, psychology, sociology, and evolutionary biology, provides arguments and rebuttals that are synthesized by an arbiter into transparent and empirically anchored judgments. As a proof-of-concept study, we evaluate CogniAlign on classic and novel moral questions and compare its outputs against GPT-4o using a five-part ethical audit framework with the help of three experts. Results show that CogniAlign consistently outperforms the baseline across more than sixty moral questions, with average performance gains of 12.2 points in analytic quality, 31.2 points in decisiveness, and 15 points in depth of explanation. In the Heinz dilemma, for example, CogniAlign achieved an overall score of 79 compared to GPT-4o's 65.8, demonstrating a decisive advantage in handling moral reasoning. Through transparent and structured reasoning, CogniAlign demonstrates the feasibility of an auditable approach to AI alignment, though certain challenges still remain.
翻译:由于道德原则的抽象性、内在冲突性以及现有方法的不可解释性,将人工智能(AI)与人类价值观对齐的挑战依然存在。本文提出CogniAlign,一个基于自然主义道德实在论的多智能体审议框架。该框架将道德推理建立在生存性之上——生存性被定义为涵盖个体与集体两个维度,并通过代表不同学科(神经科学、心理学、社会学和进化生物学)的科学家智能体之间的结构化审议来实现。每个智能体提供论点和反驳,由一名仲裁者将其综合成透明且基于经验的判断。作为一项概念验证研究,我们在经典及新颖的道德问题上评估CogniAlign,并在三位专家的协助下,使用包含五个维度的伦理审计框架将其输出与GPT-4o进行比较。结果表明,在超过六十个道德问题上,CogniAlign始终优于基线模型,在分析质量、决策明确性和解释深度方面的平均性能提升分别为12.2分、31.2分和15分。例如,在海因茨困境中,CogniAlign的总得分为79,而GPT-4o为65.8,显示出其在处理道德推理方面的决定性优势。通过透明且结构化的推理过程,CogniAlign展示了一种可审计的AI对齐方法的可行性,尽管某些挑战仍然存在。