We study distributed Sinkhorn iterations for entropy-regularized optimal transport when the Gibbs kernel operator is row-partitioned across c workers and cannot be centralized. We present Federated Sinkhorn, two exact synchronous protocols that exchange only scaling-vector slices: (i) an All-to-All scheme implemented by Allgather, and (ii) a Star (parameter-server) scheme implemented by client to server sends and server to client broadcasts. For both, we derive closed-form per-iteration compute, communication, and memory costs under an alpha-beta latency--bandwidth model, and show that the distributed iterates match centralized Sinkhorn under standard positivity assumptions. Multi-node CPU/GPU experiments validate the model and show that repeated global scaling exchange quickly becomes the dominant bottleneck as c increases. We also report an optional bounded-delay asynchronous schedule and an optional privacy measurement layer for communicated log-scalings.
翻译:本文研究吉布斯核算子按行划分至c个计算节点且无法集中化时,熵正则化最优传输问题的分布式Sinkhorn迭代方法。我们提出联邦式Sinkhorn算法——两种仅需交换缩放向量分片的精确同步协议:(i) 通过Allgather操作实现的全互联方案,(ii) 通过客户端-服务器发送与服务器-客户端广播实现的星型(参数服务器)方案。针对两种方案,我们在α-β延迟-带宽模型下推导出每轮迭代计算、通信与内存成本的闭式表达式,并证明在标准正定性假设下分布式迭代结果与集中式Sinkhorn算法等价。多节点CPU/GPU实验验证了理论模型,表明随着c值增大,重复的全局缩放交换会迅速成为主要性能瓶颈。本文还提出可选的有限延迟异步调度方案,以及针对通信对数缩放向量的可选隐私度量层。