Most existing datacenter transport protocols rely on in-order packet delivery, a design choice rooted in legacy systems and simplicity. However, advancements in technology, such as RDMA, have made it feasible to relax this requirement, allowing for more effective use of modern datacenter topologies like FatTree and Dragonfly. The rise of AI/ML workloads underscores the necessity for enhanced link utilization, a challenge for single-path load balancers due to issues like ECMP collisions. In this paper, we introduce REPS, a novel per-packet traffic load-balancing algorithm that integrates seamlessly with existing congestion control mechanisms. REPS reroutes packets around congested hotspots and unreliable or failing links with remarkable simplicity and minimal state requirements. Our evaluation demonstrates that REPS significantly outperforms traditional packet spraying and other state-of-the-art solutions in datacenter networks, offering substantial improvements in performance and link utilization.
翻译:现有数据中心传输协议大多依赖于数据包的有序交付,这一设计选择源于传统系统架构与实现简便性。然而,随着远程直接内存访问(RDMA)等技术的发展,放宽有序性要求已成为可能,从而能更有效地利用现代数据中心拓扑结构(如FatTree和Dragonfly)。人工智能/机器学习工作负载的兴起凸显了提升链路利用率的必要性,而传统单路径负载均衡器因等价多路径路由(ECMP)碰撞等问题难以应对此挑战。本文提出REPS——一种创新的逐包流量负载均衡算法,该算法可与现有拥塞控制机制无缝集成。REPS能以极简的实现方式和最少的状态信息,将数据包从拥塞热点及不可靠/故障链路周围进行重路由。实验评估表明,在数据中心网络中,REPS在性能与链路利用率方面显著优于传统数据包喷射方案及其他前沿解决方案。