Privacy-preserving distributed distribution comparison measures the distance between the distributions whose data are scattered across different agents in a distributed system and cannot be shared among the agents. In this study, we propose a novel decentralized entropic optimal transport (EOT) method, which provides a privacy-preserving and communication-efficient solution to this problem with theoretical guarantees. In particular, we design a mini-batch randomized block-coordinate descent (MRBCD) scheme to optimize the decentralized EOT distance in its dual form. The dual variables are scattered across different agents and updated locally and iteratively with limited communications among partial agents. The kernel matrix involved in the gradients of the dual variables is estimated by a distributed kernel approximation method, and each agent only needs to approximate and store a sub-kernel matrix by one-shot communication and without sharing raw data. We analyze our method's communication complexity and provide a theoretical bound for the approximation error caused by the convergence error, the approximated kernel, and the mismatch between the storage and communication protocols. Experiments on synthetic data and real-world distributed domain adaptation tasks demonstrate the effectiveness of our method.
翻译:隐私保护的分布式分布对比用于衡量在分布式系统中数据分散于不同智能体且无法共享时的分布间距离。本研究提出了一种新颖的去中心化熵正则最优传输(EOT)方法,该方法在理论保证下为此问题提供了隐私保护且通信高效的解决方案。具体而言,我们设计了一种小批量随机块坐标下降(MRBCD)方案来优化去中心化EOT距离的对偶形式。对偶变量分散于不同智能体中,并通过有限的部分智能体间通信进行局部迭代更新。梯度中对偶变量涉及的核矩阵通过分布式核近似方法估计,每个智能体仅需通过一次性通信近似并存储子核矩阵,无需共享原始数据。我们分析了所提出方法的通信复杂度,并针对由收敛误差、核近似以及存储与通信协议不匹配导致的近似误差给出了理论界。在合成数据及实际分布式域适应任务上的实验验证了方法的有效性。