Octopus: Enhancing CXL Memory Pods via Sparse Topology

The Compute Express Link (CXL) interconnect enables compute "pods" that pool memory across servers to reduce cost and improve efficiency. These pods also facilitate pairwise communication whose needs conflict with pooling. Importantly, existing pod designs are small or require indirection through expensive switches. These conventional designs implicitly assume that pods must fully connect all servers to all CXL pooling devices. This paper breaks with this conventional wisdom by introducing Octopus pods. Octopus directly connects servers to low-port-count CXL pooling devices (e.g., 4 ports) yet scales to large pods without switches by constructing a sparse CXL topology in which each pooling device connects to a carefully chosen subset of servers. Octopus explicitly balances "overlap", where two servers connect to the same pooling device: overlap reduces pooling efficiency but enables low-latency communication. Octopus resolves this tension by grouping servers into "islands" with low-latency intra-island communication and interconnecting islands to favor pooling. We build a three-server CXL pod prototype and simulate scaled pods with 96 servers under measured device characteristics and physical constraints (1.5 m copper cables). On hardware, Octopus RPCs are 3.2x faster than in-rack RDMA and 2.4x faster than CXL switches. In simulation, Octopus achieves net server cost savings of 3-5.4% whereas CXL switches result in a net cost increase.

翻译：摘要：Compute Express Link（CXL）互连技术能够构建计算“池”，通过跨服务器共享内存来降低成本并提高效率。这类池还支持成对通信，但其需求与内存池化存在冲突。关键在于，现有池设计方案规模较小，或需借助昂贵交换机进行间接通信。这些传统设计隐含假设：池必须将所有服务器与所有CXL池化设备完全连接。本文通过引入章鱼池打破了这一传统认知。章鱼池将服务器直接与低端口数CXL池化设备（如4端口）相连，但通过构建稀疏CXL拓扑（每个池化设备仅连接经过精心选择的服务器子集）实现无交换机的大规模扩展。章鱼池明确平衡“重叠”（即两台服务器连接到同一池化设备）：重叠会降低池化效率，但能实现低延迟通信。章鱼池通过将服务器分组为“岛”（岛内实现低延迟通信）并互联这些岛以优先池化功能，解决了这一矛盾。我们搭建了三服务器CXL池原型，并在实测器件特性与物理约束（1.5米铜缆）下模拟了包含96台服务器的大规模池。硬件测试表明，章鱼池的RPC性能比机架内RDMA快3.2倍，比CXL交换机快2.4倍。模拟结果显示，章鱼池可实现3-5.4%的服务器净成本节约，而CXL交换机则导致成本净增加。

相关内容

服务器

关注 14

服务器，也称伺服器，是提供计算服务的设备。由于服务器需要响应服务请求，并进行处理，因此一般来说服务器应具备承担服务并且保障服务的能力。
服务器的构成包括处理器、硬盘、内存、系统总线等，和通用的计算机架构类似，但是由于需要提供高可靠的服务，因此在处理能力、稳定性、可靠性、安全性、可扩展性、可管理性等方面要求较高。

【ICML2025】SparseLoRA：利用上下文稀疏性加速大语言模型微调

专知会员服务

11+阅读 · 2025年6月23日

TransMLA：多头潜在注意力（MLA）即为所需

专知会员服务

23+阅读 · 2025年2月13日

历数5年89篇研究，这篇综述告诉我们深度学习中的代码数据增强怎么样了

专知会员服务

31+阅读 · 2023年11月26日

【ETH博士论文】设计高效的深度神经网络：拓扑优化、量化和多任务学习，151页pdf

专知会员服务

54+阅读 · 2023年5月30日