For fifty years, networking has fragmented whenever new workloads exposed hidden assumptions about time, ordering, failure, and trust. This paper argues that the current interconnect landscape -- NVLink, UALink, Ultra Ethernet, AELink/Aethernet, TTPoE, and classical RDMA -- suffers from a semantic crisis: vendor-specific divergence disguised as optimization. We trace this crisis to the Forward-In-Time-Only (FITO) category mistake embedded in every major fabric stack, and show how each pathology -- aspirational RDMA completion, fire-and-forget GPU semantics, opaque proprietary stacks, incompatible multi-cloud ordering, universal fencing -- arises from the same failure to define explicit, testable link semantics from APIs to bits on the wire. We conjecture that RDMA achieves reliability through universal fencing that collapses concurrency into serialized checkpoints, and that precise minimal semantics can maintain correctness without global barriers, as superscalar architectures separated execution from retirement. We describe how Open Atomic Ethernet (OAE) under the Open Compute Project addresses the crisis through bilateral transaction primitives with explicit ordering, completion, and failure visibility. Drawing on Helland's analysis of scalable OLTP isolation (the "BIG DEAL"), we show the crisis pervades the entire stack. We assess whether convergence on a single open standard is still possible or whether fragmentation is now structural.
翻译:五十年来,每当新的工作负载暴露出关于时间、排序、故障和信任的隐含假设时,网络技术就会走向碎片化。本文认为当前互连技术格局——NVLink、UALink、Ultra Ethernet、AELink/Aethernet、TTPoE以及经典RDMA——正面临语义危机:这种以优化为名的厂商特异性分化实则是根本性危机。我们将此危机溯源至嵌入所有主流架构栈的"仅前向时间"(FITO)范畴错误,并论证每种病理现象——理想化的RDMA完成语义、即发即弃的GPU语义、不透明的专有协议栈、不兼容的多云排序机制、全局屏障——都源于同一根本缺陷:未能从API到线缆比特层面定义明确且可测试的链路语义。我们推测RDMA通过全局屏障实现可靠性,这种机制将并发性坍缩为串行化检查点;而精确的最小化语义可以像超标量架构将执行阶段与提交阶段分离那样,在不依赖全局屏障的情况下保持正确性。我们阐述了开放计算项目下的开放原子以太网(OAE)如何通过具备显式排序、完成状态和故障可见性的双向事务原语应对此危机。借鉴Helland对可扩展OLTP隔离机制("重大交易")的分析,我们论证该危机已渗透整个技术栈。最后评估了收敛于单一开放标准是否仍具可能性,抑或碎片化已成为结构性常态。