Modernizing Amdahl's Law: How AI Scaling Laws Shape Computer Architecture

Classical Amdahl's Law conceptualized the limit of speedup for an era of fixed serial-parallel decomposition and homogeneous replication. Modern heterogeneous systems need a different conceptual framework: constrained resources must be allocated across heterogeneous hardware while workloads themselves change, with some stages becoming effectively bounded and others continuing to absorb additional effective compute. This paper reformulates Amdahl's Law around that shift. We replace processor count with an allocation variable, replace the classical parallel fraction with a value-scalable fraction, and model specialization by a relative efficiency ratio between dedicated and programmable compute. The resulting objective yields a finite collapse threshold. For a specialized efficiency ratio R, there is a critical scalable fraction S_c = 1 - 1/R beyond which the optimal allocation to specialization becomes zero. Equivalently, for a given scalable fraction S, the minimum efficiency ratio required to justify specialization is R_c = 1/(1-S). Thus, as value-scalable workload grows, over-customization faces a rising bar. The point is not that one hardware class simply defeats another, but that architecture must preserve a sufficiently programmable substrate against a moving frontier of work whose marginal gains keep scaling. In practice, that frontier is often sustained by software- and model-driven efficiency doublings rather than by fixed-function redesign alone. The model helps explain the migration of value-producing work toward learned late-stage computation and the shared design pressure that is making both GPUs and AI accelerators more programmable2

翻译：经典Amdahl定律概念化了固定串行-并行分解与同构复制时代的加速极限。现代异构系统需要不同的概念框架：必须将受限资源分配给异构硬件，同时工作负载本身也在变化——某些阶段变得有效受限，而其他阶段则持续吸收额外的有效计算资源。本文围绕这一转变重新构建了Amdahl定律。我们用分配变量取代处理器数量，用价值可缩放分数取代经典并行分数，并通过专用计算与可编程计算之间的相对效率比来建模专业化特性。由此得到的目标函数产生了一个有限崩溃阈值。对于专用效率比R，存在临界可缩放分数S_c = 1 - 1/R，超过该值时专业化最优分配为零。等价地，对于给定可缩放分数S，证明专业化合理所需的最小效率比为R_c = 1/(1-S)。这表明，随着价值可缩放工作负载增长，过度定制面临着日益提高的准入门槛。关键之处不在于某类硬件简单地击败另一类，而在于架构必须针对不断移动的工作前沿保留足够可编程的计算基底，其边际收益仍在持续扩展。实践中，这一前沿通常由软件和模型驱动的效率加倍所维持，而非单纯依赖固定功能重新设计。该模型有助于解释高价值工作向学习型后期计算迁移的现象，以及GPU和AI加速器均在提升可编程性的共同设计压力。