With the rapid growth of artificial intelligence (AI) workloads in datacenters, the Ultra Ethernet Consortium (UEC) has defined a new high-performance transport layer to deliver the required performance at scale. A core component of this new standard is the Network Signal-based Congestion Control (NSCC) algorithm. This paper presents SMaRTT, the algorithm that forms the basis of the UEC NSCC specification. SMaRTT is a sender-based congestion control algorithm that systematically combines delay, Explicit Congestion Notification (ECN), and optional packet trimming into a cohesive state machine for fast, fair and precise window adjustments with seamless multipath support. At its core lies the novel QuickAdapt algorithm that accurately estimates and rapidly adapts to available capacity. Our evaluation shows that SMaRTT outperforms existing datacenter congestion control algorithms like Swift, RoCE, and MPRDMA by up to 50% and provides superior fairness, validating the design choices made in the UEC standard.
翻译:随着人工智能(AI)工作负载在数据中心中的快速增长,超以太网联盟(UEC)定义了一种新的高性能传输层,以提供大规模所需的性能。这一新标准的核心组成部分是基于网络信号的拥塞控制(NSCC)算法。本文提出了SMaRTT,该算法构成了UEC NSCC规范的基础。SMaRTT是一种基于发送端的拥塞控制算法,它系统地将延迟、显式拥塞通知(ECN)以及可选的数据包修整结合到一个统一的状态机中,以实现快速、公平且精确的窗口调整,并支持无缝多路径传输。其核心是新颖的QuickAdapt算法,该算法能够准确估计并快速适应可用带宽。我们的评估表明,SMaRTT在性能上优于现有的数据中心拥塞控制算法(如Swift、RoCE和MPRDMA),提升幅度高达50%,并提供了更优的公平性,从而验证了UEC标准中的设计选择。