Relative Principals, Pluralistic Alignment, and the Structural Value Alignment Problem - 专知论文

会员服务 ·

0

对齐 · 系统 · 结构 · 结构性 · 塑造 ·

Relative Principals, Pluralistic Alignment, and the Structural Value Alignment Problem

翻译：相对原则、多元对齐以及结构性价值对齐问题

from arxiv, Accepted in the Ninth Annual ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT) 2026

The value alignment problem for artificial intelligence (AI) is often framed as a purely technical or normative challenge, sometimes focused on hypothetical future systems. I argue that the problem is better understood as a structural question about governance: not whether an AI system is aligned in the abstract, but whether it is aligned enough, for whom, and at what cost. Drawing on the principal-agent framework from economics, this paper reconceptualises misalignment as arising along three interacting axes: objectives, information, and principals. The three-axis framework provides a systematic way of diagnosing why misalignment arises in real-world systems and clarifies that alignment cannot be treated as a single technical property of models but an outcome shaped by how objectives are specified, how information is distributed, and whose interests count in practice. The core contribution of this paper is to show that the three-axis decomposition implies that alignment is fundamentally a problem of governance rather than engineering alone. From this perspective, alignment is inherently pluralistic and context-dependent, and resolving misalignment involves trade-offs among competing values. Because misalignment can occur along each axis -- and affect stakeholders differently -- the structural description shows that alignment cannot be "solved" through technical design alone, but must be managed through ongoing institutional processes that determine how objectives are set, how systems are evaluated, and how affected communities can contest or reshape those decisions.

翻译：人工智能的价值对齐问题常被视作纯粹的技术或规范性挑战，有时聚焦于假设的未来系统。本文认为，该问题更应被理解为关于治理的结构性问题：并非抽象意义上AI系统是否对齐，而是对齐到何种程度、对谁而言、以何种代价。借鉴经济学中的委托-代理框架，本文重新将“错位”概念化为沿三个相互作用轴产生：目标、信息与委托方。三维框架提供了一种系统诊断现实系统中错位成因的方法，并阐明对齐不能被视为模型的单一技术属性，而是由目标如何规定、信息如何分布以及实践中谁的利益被纳入考量所共同塑造的结果。本文的核心贡献在于表明，三维分解意味着对齐从根本上而言是治理问题，而非单纯的工程问题。由此视角，对齐内在地具有多元性与情境依赖性，解决错位需在竞争性价值间进行权衡。由于错位可能沿每一轴发生——并对不同利益相关者产生差异化影响——结构性描述表明，对齐无法通过技术设计单独“解决”，而必须通过持续的体制性过程加以管理，这些过程决定了目标如何设定、系统如何评估，以及受影响群体如何对相关决策提出质疑或重新塑造。

0

相关内容

【NeurIPS2025教程】人类–AI 对齐：基础、方法、实践与挑战

【NeurIPS2025教程】人类–AI 对齐：基础、方法、实践与挑战

专知会员服务

26+阅读 · 2025年12月7日

如何对齐？北大最新271页ICML2025教程《语言模型的对齐方法：一种机器学习视角》

如何对齐？北大最新271页ICML2025教程《语言模型的对齐方法：一种机器学习视角》

专知会员服务

47+阅读 · 2025年7月16日

大语言模型对齐研究综述

大语言模型对齐研究综述

专知会员服务

56+阅读 · 2024年8月1日

【MIT博士论文】人工智能与人类对齐的构建模块：指定、检查、建模和修订，216页pdf

【MIT博士论文】人工智能与人类对齐的构建模块：指定、检查、建模和修订，216页pdf

专知会员服务

44+阅读 · 2024年4月2日

《大模型对齐方法》最新综述

《大模型对齐方法》最新综述

专知会员服务

85+阅读 · 2024年3月8日

112页《人工智能对齐：全面性综述》中文版

112页《人工智能对齐：全面性综述》中文版

专知会员服务

159+阅读 · 2024年2月1日

覆盖800+文献、多位知名学者挂帅，北大联合剑桥、CMU等多所高校发布《AI 对齐 (Alignment)》全面性综述

覆盖800+文献、多位知名学者挂帅，北大联合剑桥、CMU等多所高校发布《AI 对齐 (Alignment)》全面性综述

专知会员服务

54+阅读 · 2023年11月1日

大模型道德价值观对齐问题剖析

大模型道德价值观对齐问题剖析

专知会员服务

79+阅读 · 2023年10月3日

哈工大秦兵教授 | 大语言模型之人类价值观对齐

哈工大秦兵教授 | 大语言模型之人类价值观对齐

专知会员服务

62+阅读 · 2023年8月4日

【DeepMind】人工智能、价值与对齐，Artificial Intelligence, Values, and Alignment

【DeepMind】人工智能、价值与对齐，Artificial Intelligence, Values, and Alignment

专知会员服务

40+阅读 · 2020年1月13日

【254页博士论文】《动态多目标环境中基于深度强化学习的智能决策方案》

【254页博士论文】《动态多目标环境中基于深度强化学习的智能决策方案》

专知

33+阅读 · 2022年10月17日

「实体对齐」最新2022综述

「实体对齐」最新2022综述

专知

13+阅读 · 2022年3月17日

【论文】本体匹配实体对齐知识融合入门论文推荐

【论文】本体匹配实体对齐知识融合入门论文推荐

深度学习自然语言处理

25+阅读 · 2020年3月8日

情感计算综述

情感计算综述

人工智能学家

34+阅读 · 2019年4月6日

跨多个异构数据源的实体对齐

跨多个异构数据源的实体对齐

FCS

15+阅读 · 2019年3月13日

论文浅尝 | 基于知识图谱嵌入的 Bootstrapping 实体对齐方法

论文浅尝 | 基于知识图谱嵌入的 Bootstrapping 实体对齐方法

开放知识图谱

17+阅读 · 2019年1月5日

不对称多代理博弈中的博弈理论解读

不对称多代理博弈中的博弈理论解读

AI前线

14+阅读 · 2018年3月8日

【干货】一文读懂智能对话系统，当前研究综述和未来趋势

【干货】一文读懂智能对话系统，当前研究综述和未来趋势

新智元

13+阅读 · 2018年1月23日

知识图谱 vs. 对话系统专题讨论 - PaperWeekly 社区

知识图谱 vs. 对话系统专题讨论 - PaperWeekly 社区

PaperWeekly

10+阅读 · 2017年10月18日

各种相似性度量及Python实现

各种相似性度量及Python实现

机器学习算法与Python学习

11+阅读 · 2017年7月6日

面向计算机视觉问题的图匹配算法研究与应用

国家自然科学基金

1+阅读 · 2015年12月31日

量化约束满足问题相变现象研究

国家自然科学基金

0+阅读 · 2015年12月31日

信息不完全的双边匹配决策方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

结构矩阵线性互补问题的模系矩阵分裂迭代方法

国家自然科学基金

0+阅读 · 2015年12月31日

对偶三角模-余模逻辑的语义理论与应用

国家自然科学基金

0+阅读 · 2014年12月31日

图像分割中若干图论问题的研究

国家自然科学基金

0+阅读 · 2014年12月31日

函数逼近论的一些极值问题与多元线性问题的可处理性

国家自然科学基金

2+阅读 · 2014年12月31日

大尺度变形的三维几何模型的对应关系和分割问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

对称锥互补问题的算法研究及其在压缩感知中的应用

国家自然科学基金

0+阅读 · 2014年12月31日

椭圆边值问题的齐性化理论及调和分析方法之研究

国家自然科学基金

0+阅读 · 2014年12月31日

The Alignment Target Problem: Divergent Moral Judgments of Humans, AI Systems, and Their Designers

Arxiv

0+阅读 · 5月12日

Unifying Perspectives: Plausible Counterfactual Explanations on Global, Group-wise, and Local Levels

Arxiv

0+阅读 · 5月11日

Discovering Ordinary Differential Equations with LLM-Based Qualitative and Quantitative Evaluation

Arxiv

0+阅读 · 5月8日

Exact Structural Abstraction and Tractability Limits

Arxiv

0+阅读 · 4月27日

The Alignment Target Problem: Divergent Moral Judgments of Humans, AI Systems, and Their Designers

Arxiv

0+阅读 · 4月27日

Aligning Human-AI-Interaction Trust for Mental Health Support: Survey and Position for Multi-Stakeholders

Arxiv

0+阅读 · 4月22日

Exact Structural Abstraction and Tractability Limits

Arxiv

0+阅读 · 4月13日

Spike-based alignment learning solves the weight transport problem

Arxiv

0+阅读 · 4月8日

On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models

Arxiv

18+阅读 · 2024年3月7日

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

Arxiv

29+阅读 · 2023年8月10日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

专知会员服务

4+阅读 · 6月22日

综述 | 3D场景图：开放挑战与未来方向

综述 | 3D场景图：开放挑战与未来方向

专知会员服务

7+阅读 · 6月22日

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

专知会员服务

6+阅读 · 6月22日

21世纪的无人机战争

21世纪的无人机战争

专知会员服务

4+阅读 · 6月22日

《伊朗与以色列-美国热战及其对数字技术的影响》

《伊朗与以色列-美国热战及其对数字技术的影响》

专知会员服务

5+阅读 · 6月22日

《量子技术的军事任务技术适配与利用》

《量子技术的军事任务技术适配与利用》

专知会员服务

5+阅读 · 6月22日

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

《美国陆军军官学校（西点军校）本科生科研中生成式人工智能的使用》

专知会员服务

8+阅读 · 6月22日

美国从乌克兰无人机战争中学习经验

美国从乌克兰无人机战争中学习经验

专知会员服务

7+阅读 · 6月21日

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

专知会员服务

5+阅读 · 6月21日

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

专知会员服务

8+阅读 · 6月21日

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

专知会员服务

22+阅读 · 6月20日

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

5+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

8+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

7+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

10+阅读 · 6月18日

相关VIP内容

【NeurIPS2025教程】人类–AI 对齐：基础、方法、实践与挑战

【NeurIPS2025教程】人类–AI 对齐：基础、方法、实践与挑战

专知会员服务

26+阅读 · 2025年12月7日

如何对齐？北大最新271页ICML2025教程《语言模型的对齐方法：一种机器学习视角》

如何对齐？北大最新271页ICML2025教程《语言模型的对齐方法：一种机器学习视角》

专知会员服务

47+阅读 · 2025年7月16日

大语言模型对齐研究综述

大语言模型对齐研究综述

专知会员服务

56+阅读 · 2024年8月1日

【MIT博士论文】人工智能与人类对齐的构建模块：指定、检查、建模和修订，216页pdf

【MIT博士论文】人工智能与人类对齐的构建模块：指定、检查、建模和修订，216页pdf

专知会员服务

44+阅读 · 2024年4月2日

《大模型对齐方法》最新综述

《大模型对齐方法》最新综述

专知会员服务

85+阅读 · 2024年3月8日

112页《人工智能对齐：全面性综述》中文版

112页《人工智能对齐：全面性综述》中文版

专知会员服务

159+阅读 · 2024年2月1日

覆盖800+文献、多位知名学者挂帅，北大联合剑桥、CMU等多所高校发布《AI 对齐 (Alignment)》全面性综述

覆盖800+文献、多位知名学者挂帅，北大联合剑桥、CMU等多所高校发布《AI 对齐 (Alignment)》全面性综述

专知会员服务

54+阅读 · 2023年11月1日

大模型道德价值观对齐问题剖析

大模型道德价值观对齐问题剖析

专知会员服务

79+阅读 · 2023年10月3日

哈工大秦兵教授 | 大语言模型之人类价值观对齐

哈工大秦兵教授 | 大语言模型之人类价值观对齐

专知会员服务

62+阅读 · 2023年8月4日

【DeepMind】人工智能、价值与对齐，Artificial Intelligence, Values, and Alignment

【DeepMind】人工智能、价值与对齐，Artificial Intelligence, Values, and Alignment

专知会员服务

40+阅读 · 2020年1月13日

热门VIP内容

开通专知VIP会员享更多权益服务

综述 | 3D场景图：开放挑战与未来方向

21世纪的无人机战争

ICML 2026 | 边界嵌入塑形：用自适应对比学习破解图结构纠缠

《国防工业6.0：全自主作战系统、量子-人工智能融合与新一代战略威慑》

相关资讯

【254页博士论文】《动态多目标环境中基于深度强化学习的智能决策方案》

【254页博士论文】《动态多目标环境中基于深度强化学习的智能决策方案》

专知

33+阅读 · 2022年10月17日

「实体对齐」最新2022综述

「实体对齐」最新2022综述

专知

13+阅读 · 2022年3月17日

【论文】本体匹配实体对齐知识融合入门论文推荐

【论文】本体匹配实体对齐知识融合入门论文推荐

深度学习自然语言处理

25+阅读 · 2020年3月8日

情感计算综述

情感计算综述

人工智能学家

34+阅读 · 2019年4月6日

跨多个异构数据源的实体对齐

跨多个异构数据源的实体对齐

FCS

15+阅读 · 2019年3月13日

论文浅尝 | 基于知识图谱嵌入的 Bootstrapping 实体对齐方法

论文浅尝 | 基于知识图谱嵌入的 Bootstrapping 实体对齐方法

开放知识图谱

17+阅读 · 2019年1月5日

不对称多代理博弈中的博弈理论解读

不对称多代理博弈中的博弈理论解读

AI前线

14+阅读 · 2018年3月8日

【干货】一文读懂智能对话系统，当前研究综述和未来趋势

【干货】一文读懂智能对话系统，当前研究综述和未来趋势

新智元

13+阅读 · 2018年1月23日

知识图谱 vs. 对话系统专题讨论 - PaperWeekly 社区

知识图谱 vs. 对话系统专题讨论 - PaperWeekly 社区

PaperWeekly

10+阅读 · 2017年10月18日

各种相似性度量及Python实现

各种相似性度量及Python实现

机器学习算法与Python学习

11+阅读 · 2017年7月6日

相关论文

The Alignment Target Problem: Divergent Moral Judgments of Humans, AI Systems, and Their Designers

Arxiv

0+阅读 · 5月12日

Unifying Perspectives: Plausible Counterfactual Explanations on Global, Group-wise, and Local Levels

Arxiv

0+阅读 · 5月11日

Discovering Ordinary Differential Equations with LLM-Based Qualitative and Quantitative Evaluation

Arxiv

0+阅读 · 5月8日

Exact Structural Abstraction and Tractability Limits

Arxiv

0+阅读 · 4月27日

The Alignment Target Problem: Divergent Moral Judgments of Humans, AI Systems, and Their Designers

Arxiv

0+阅读 · 4月27日

Aligning Human-AI-Interaction Trust for Mental Health Support: Survey and Position for Multi-Stakeholders

Arxiv

0+阅读 · 4月22日

Exact Structural Abstraction and Tractability Limits

Arxiv

0+阅读 · 4月13日

Spike-based alignment learning solves the weight transport problem

Arxiv

0+阅读 · 4月8日

On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models

Arxiv

18+阅读 · 2024年3月7日

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

Arxiv

29+阅读 · 2023年8月10日

相关基金

面向计算机视觉问题的图匹配算法研究与应用

国家自然科学基金

1+阅读 · 2015年12月31日

量化约束满足问题相变现象研究

国家自然科学基金

0+阅读 · 2015年12月31日

信息不完全的双边匹配决策方法研究

国家自然科学基金

3+阅读 · 2015年12月31日

结构矩阵线性互补问题的模系矩阵分裂迭代方法

国家自然科学基金

0+阅读 · 2015年12月31日

对偶三角模-余模逻辑的语义理论与应用

国家自然科学基金

0+阅读 · 2014年12月31日

图像分割中若干图论问题的研究

国家自然科学基金

0+阅读 · 2014年12月31日

函数逼近论的一些极值问题与多元线性问题的可处理性

国家自然科学基金

2+阅读 · 2014年12月31日

大尺度变形的三维几何模型的对应关系和分割问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

对称锥互补问题的算法研究及其在压缩感知中的应用

国家自然科学基金

0+阅读 · 2014年12月31日

椭圆边值问题的齐性化理论及调和分析方法之研究

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员