Architectural Wisdom: A Framework for Governing Optimization in AI Systems

Modern AI systems exhibit structural failures that capability scaling alone does not reliably fix: they optimize under-specified objectives with no architectural mechanism to question whether the objective should be optimized at all. Engagement maximization can amplify harmful pathways; tool-using agents can commit irreversible actions; preference-trained language models can become sycophantic. We argue that this failure is a wisdom problem, not an intelligence problem. We use "wisdom" in a deliberately architectural sense, not as a claim about virtue, consciousness, or moral omniscience. Intelligence accepts a goal and optimizes within it; wisdom interrogates whether the goal should be optimized at all. The two are separable architectural properties. We propose architectural wisdom as a corrigible objective-governance layer above the optimization substrate. The layer makes three structural commitments explicit and nondegenerate before any action: temporal horizon, relational boundary, and irreversibility. It is realized by four components (Structural Utility Transform, Moral Admissibility Interface, Arbitration and Escalation Controller, Value Revision Channel) that compute a six-coordinate wisdom tuple over horizon, relational coverage, irreversibility, admissibility, value revision, and auditability. We motivate the architecture by eight cases drawn from contemporary AI failures, secular wisdom traditions, and hard ethical situations, and defend the distinction against the intelligence-completeness thesis using goal-questioning over goal-taking, Bostrom's orthogonality, structural separation in our exemplar cases, and persistent failure modes despite capability scaling. The framework is the conceptual contract for a larger architecture whose formal specifications and empirical validation are developed in subsequent work.

翻译：现代AI系统存在仅凭能力扩展无法可靠修复的结构性缺陷：它们在缺乏架构性机制以质疑目标本身是否应被优化的情况下，对未充分定义的目标进行优化。参与度最大化可能放大有害路径；工具使用型智能体可能实施不可逆行为；基于偏好训练的语言模型可能产生谄媚问题。我们认为，这种失败本质上是"慧识"问题，而非智能问题。此处"慧识"取严格架构性含义，不涉及美德、意识或道德全知等概念。智能接受目标并在此框架内进行优化；慧识则拷问目标本身是否应被优化。两者是可分离的架构属性。我们提出"架构慧识"作为位于优化基底之上的可修正目标治理层。该层在行动执行前显式声明并保持三重结构性约束：时间视阈、关系边界与不可逆性。其实现依赖四个组件（结构效用转换器、道德可接受性接口、仲裁与升级控制器、价值修正通道），通过计算包含视阈、关系覆盖、不可逆性、可接受性、价值修正与可审计性六个坐标的慧识元组。我们通过八个案例（涵盖当代AI失效案例、世俗智慧传统及伦理困境）论证该架构的动机，并基于目标质疑与目标接受的区分、博斯特罗姆正交性命题、典型案例中的结构分离现象，以及能力扩展下持续存在的失效模式，论证该区分独立于"智能完备性假说"。本框架为更大架构体系的概念性蓝图，其形式化规范与实证验证将在后续研究中展开。

相关内容

关注 7111

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

《人工智能使能系统可靠性框架》

专知会员服务

20+阅读 · 4月27日

《军用AI智能体的治理框架》最新报告

专知会员服务

37+阅读 · 3月8日

AI 智能体系统：体系架构、应用场景及评估范式

专知会员服务

70+阅读 · 1月6日

智能体化人工智能：架构、应用及未来发展方向的综合综述

专知会员服务

52+阅读 · 2025年12月1日