All-to-All communication is a key performance bottleneck for distributed machine learning (ML) and high-performance computing (HPC) workloads, where dense traffic increasingly stresses scale-up interconnects. While these ML and HPC workloads have driven unprecedented infrastructure demand, optical reconfigurable networks (ORNs) offer a promising path forward. By adapting the physical topology to the active workload, they improve communication cost and bandwidth utilization. However, their benefit is critically contingent on whether the collective consists of structured phases that can be served by sparse and reusable topology states. In this paper, we revisit Bruck's All-to-All implementation and demonstrate the benefits of topology optimization in which both communication pattern and reconfiguration strategy are co-designed. We present ReTri, a bidirectional All-to-All schedule for ORNs. ReTri uses balanced ternary block propagation to complete All-to-All in $\lceil \log_3 n\rceil$ phases. The induced reconfiguration strategy from ReTri's pairwise bidirectional exchanges allow reconfiguration delays to be amortized across multiple phases. Preliminary simulations show that ReTri improves completion time by up to $10\times$ over static All-to-All, even for millisecond-scale reconfiguration delays, and improving reconfigurable Bruck by up to $2.1\times$.
翻译:全互连通信是分布式机器学习和高性能计算工作负载的关键性能瓶颈,其中密集流量日益加剧互连扩展的压力。尽管这些ML和HPC工作负载推动了前所未有的基础设施需求,但光学可重构网络提供了一条有前景的路径。通过根据活跃工作负载调整物理拓扑,它们改善了通信成本和带宽利用率。然而,其优势关键取决于聚合体是否包含可由稀疏且可复用的拓扑状态服务的结构化相位。本文回顾了布吕克的全互连实现,并展示了拓扑优化的优势,其中通信模式和重构策略共同设计。我们提出了ReTri,一种面向光学可重构网络的双向全互连调度方案。ReTri利用平衡三进制块传播在$\lceil \log_3 n\rceil$个相位内完成全互连。ReTri的成对双向交换所诱导的重构策略,允许重构延迟在多个相位间分摊。初步仿真表明,即使对于毫秒级重构延迟,ReTri相比静态全互连将完成时间提升高达10倍,相比可重构布吕克提升高达2.1倍。