Many machine learning applications require operating on a spatially distributed dataset. Despite technological advances, privacy considerations and communication constraints may prevent gathering the entire dataset in a central unit. In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers, which is commonly used in the optimization literature due to its fast convergence. In contrast to distributed optimization, distributed sampling allows for uncertainty quantification in Bayesian inference tasks. We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art. For our theoretical results, we use convex optimization tools to establish a fundamental inequality on the generated local sample iterates. This inequality enables us to show convergence of the distribution associated with these iterates to the underlying target distribution in Wasserstein distance. In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
翻译:许多机器学习应用需要处理空间分布的数据集。尽管技术不断进步,但隐私考量和通信约束可能阻止将整个数据集集中到中央单元。本文提出了一种基于交替方向乘子法的分布式采样方案,该算法因收敛速度快而广泛应用于优化领域。与分布式优化不同,分布式采样能够对贝叶斯推理任务中的不确定性进行量化。我们不仅提供了算法收敛的理论保证,还通过实验证明了其相较于当前最先进方法的优越性。在理论层面,我们利用凸优化工具建立了关于局部样本迭代生成序列的基本不等式。该不等式使我们能够证明这些迭代对应的分布在Wasserstein距离下收敛于目标分布。在仿真实验中,我们将所提算法应用于线性回归和逻辑回归任务,展示了其相较于现有梯度方法的快速收敛性。