The value alignment problem for artificial intelligence (AI) is often framed as a purely technical or normative challenge, sometimes focused on hypothetical future systems. I argue that the problem is better understood as a structural question about governance: not whether an AI system is aligned in the abstract, but whether it is aligned enough, for whom, and at what cost. Drawing on the principal-agent framework from economics, this paper reconceptualises misalignment as arising along three interacting axes: objectives, information, and principals. The three-axis framework provides a systematic way of diagnosing why misalignment arises in real-world systems and clarifies that alignment cannot be treated as a single technical property of models but an outcome shaped by how objectives are specified, how information is distributed, and whose interests count in practice. The core contribution of this paper is to show that the three-axis decomposition implies that alignment is fundamentally a problem of governance rather than engineering alone. From this perspective, alignment is inherently pluralistic and context-dependent, and resolving misalignment involves trade-offs among competing values. Because misalignment can occur along each axis -- and affect stakeholders differently -- the structural description shows that alignment cannot be "solved" through technical design alone, but must be managed through ongoing institutional processes that determine how objectives are set, how systems are evaluated, and how affected communities can contest or reshape those decisions.
翻译:人工智能的价值对齐问题常被视作纯粹的技术或规范性挑战,有时聚焦于假设的未来系统。本文认为,该问题更应被理解为关于治理的结构性问题:并非抽象意义上AI系统是否对齐,而是对齐到何种程度、对谁而言、以何种代价。借鉴经济学中的委托-代理框架,本文重新将“错位”概念化为沿三个相互作用轴产生:目标、信息与委托方。三维框架提供了一种系统诊断现实系统中错位成因的方法,并阐明对齐不能被视为模型的单一技术属性,而是由目标如何规定、信息如何分布以及实践中谁的利益被纳入考量所共同塑造的结果。本文的核心贡献在于表明,三维分解意味着对齐从根本上而言是治理问题,而非单纯的工程问题。由此视角,对齐内在地具有多元性与情境依赖性,解决错位需在竞争性价值间进行权衡。由于错位可能沿每一轴发生——并对不同利益相关者产生差异化影响——结构性描述表明,对齐无法通过技术设计单独“解决”,而必须通过持续的体制性过程加以管理,这些过程决定了目标如何设定、系统如何评估,以及受影响群体如何对相关决策提出质疑或重新塑造。