We design and deploy at Amazon the first production datacenter fabrics based on random graphs. While the cost and fault-tolerance benefits of such topologies have been long known, their practical realization has been hampered by a lack of scalable routing and cabling approaches. Our design, called RNG, has a new distributed routing protocol that exploits the properties of random graphs to find a large number of edge disjoint paths between endpoint pairs. A novel passive optical device that internally shuffles cable endpoints makes Amazon's cabling complexity similar to that of fat trees. We show that RNG fabrics match or exceed the performance of fat trees for a range of traffic patterns, despite being up to 45% cheaper. At Amazon, we made RNG the default datacenter fabric for most workloads.
翻译:我们在亚马逊设计并部署了首个基于随机图的商用数据中心网络架构。尽管此类拓扑在成本和容错性方面的优势早已为学界所知,但可扩展路由方案与布线技术的缺失始终阻碍其实际落地。我们提出的RNG(随机网络图)设计配备新型分布式路由协议,该协议利用随机图特性在端点对间发掘大量边不交路径。通过一种创新性无源光器件实现线缆端点内部混洗,使得亚马逊的布线复杂度与胖树架构相当。实验表明,尽管RNG架构成本降低45%,其在多种流量模式下的性能仍可媲美甚至超越胖树。目前,亚马逊已将RNG设定为多数工作负载的默认数据中心网络架构。