The Compute Express Link (CXL) interconnect enables compute "pods" that pool memory across servers to reduce cost and improve efficiency. These pods also facilitate pairwise communication whose needs conflict with pooling. Importantly, existing pod designs are small or require indirection through expensive switches. These conventional designs implicitly assume that pods must fully connect all servers to all CXL pooling devices. This paper breaks with this conventional wisdom by introducing Octopus pods. Octopus directly connects servers to low-port-count CXL pooling devices (e.g., 4 ports) yet scales to large pods without switches by constructing a sparse CXL topology in which each pooling device connects to a carefully chosen subset of servers. Octopus explicitly balances "overlap", where two servers connect to the same pooling device: overlap reduces pooling efficiency but enables low-latency communication. Octopus resolves this tension by grouping servers into "islands" with low-latency intra-island communication and interconnecting islands to favor pooling. We build a three-server CXL pod prototype and simulate scaled pods with 96 servers under measured device characteristics and physical constraints (1.5 m copper cables). On hardware, Octopus RPCs are 3.2x faster than in-rack RDMA and 2.4x faster than CXL switches. In simulation, Octopus achieves net server cost savings of 3-5.4% whereas CXL switches result in a net cost increase.
翻译:摘要:Compute Express Link(CXL)互连技术能够构建计算“池”,通过跨服务器共享内存来降低成本并提高效率。这类池还支持成对通信,但其需求与内存池化存在冲突。关键在于,现有池设计方案规模较小,或需借助昂贵交换机进行间接通信。这些传统设计隐含假设:池必须将所有服务器与所有CXL池化设备完全连接。本文通过引入章鱼池打破了这一传统认知。章鱼池将服务器直接与低端口数CXL池化设备(如4端口)相连,但通过构建稀疏CXL拓扑(每个池化设备仅连接经过精心选择的服务器子集)实现无交换机的大规模扩展。章鱼池明确平衡“重叠”(即两台服务器连接到同一池化设备):重叠会降低池化效率,但能实现低延迟通信。章鱼池通过将服务器分组为“岛”(岛内实现低延迟通信)并互联这些岛以优先池化功能,解决了这一矛盾。我们搭建了三服务器CXL池原型,并在实测器件特性与物理约束(1.5米铜缆)下模拟了包含96台服务器的大规模池。硬件测试表明,章鱼池的RPC性能比机架内RDMA快3.2倍,比CXL交换机快2.4倍。模拟结果显示,章鱼池可实现3-5.4%的服务器净成本节约,而CXL交换机则导致成本净增加。