Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture

Large language models are undergoing a transition from model technology to system technology. Engineering challenges like cache reuse, context capacity, agent scheduling, and permission control resemble classical computer systems problems. This raises a question: if we treat the LLM as a CPU, KV cache as processor cache, context window as main memory, and agent framework as an operating system, can decades of computer architecture wisdom guide next generation model native systems? This paper pursues this analogy as a visionary survey. We map computer architecture concepts onto the emerging model native stack, survey literature across LLM as OS, memory management, agent frameworks, tool protocols, multi agent coordination, cognitive architectures, and safety governance, finding that each addresses a different layer without a unifying model. We propose the Intelligent Computing Architecture (ICA): six functional layers with interface contracts and design axioms. We resolve the tension over whether the LLM resembles a CPU or OS via a dual plane architecture a probabilistic execution plane (what can be computed) and a deterministic control plane (what should be computed), with every layer passing through as a graded crossover. We propose three Amdahl style design heuristics Semantic Locality, Context Budget, and Agent Speedup as organizing back of envelope models, illustrate their parameter ranges with published data, and identify predictive validation as the principal open task. We articulate analogy boundaries, note differences between silicon and model era architectures, and propose a research roadmap. This is a conceptual and survey contribution with no new experimental results.

翻译：大型语言模型正经历从模型技术到系统技术的转变。缓存复用、上下文容量、代理调度与权限控制等工程挑战，与经典计算机系统问题高度相似。这引发了一个问题：若将LLM视为CPU、KV缓存视为处理器缓存、上下文窗口视为主存、代理框架视为操作系统，数十年的计算机体系结构智慧能否指导下一代模型原生系统？本文以此类比为基础展开前瞻性综述。我们将计算机体系结构概念映射至新兴的模型原生技术栈，系统梳理了LLM即操作系统、内存管理、代理框架、工具协议、多代理协同、认知架构及安全治理等领域的文献，发现各类研究仅针对不同层次进行探讨，尚未形成统一框架。我们提出智能计算架构（ICA）：包含六个功能层及接口契约与设计公理。通过双层平面架构——概率执行平面（可计算内容）与确定性控制平面（应计算内容），每层以梯度交叉方式贯穿其中——解决了LLM究竟类似CPU还是操作系统的争议。提出三项Amdahl式设计启发法则：语义局部性、上下文预算与代理加速比，作为组织粗略估算的模型框架，利用公开数据展示其参数范围，并将预测验证确立为主要开放性问题。我们阐明类比边界，指出硅基时代与模型时代架构的差异，并提出研究路线图。本文为概念性与综述性贡献，不包含新实验成果。