The distributed optimization problem has become increasingly relevant recently. It has a lot of advantages such as processing a large amount of data in less time compared to non-distributed methods. However, most distributed approaches suffer from a significant bottleneck - the cost of communications. Therefore, a large amount of research has recently been directed at solving this problem. One such approach uses local data similarity. In particular, there exists an algorithm provably optimally exploiting the similarity property. But this result, as well as results from other works solve the communication bottleneck by focusing only on the fact that communication is significantly more expensive than local computing and does not take into account the various capacities of network devices and the different relationship between communication time and local computing expenses. We consider this setup and the objective of this study is to achieve an optimal ratio of distributed data between the server and local machines for any costs of communications and local computations. The running times of the network are compared between uniform and optimal distributions. The superior theoretical performance of our solutions is experimentally validated.
翻译:分布式优化问题近年来日益重要。与非分布式方法相比,该方法具有在更短时间内处理大量数据等诸多优势。然而,大多数分布式方法存在一个显著瓶颈——通信成本。因此,近期大量研究致力于解决这一问题。其中一种方法利用局部数据相似性。具体而言,存在一种经证明能最优利用该相似性属性的算法。但该结果及同类研究在解决通信瓶颈时,仅聚焦于通信成本远高于本地计算成本这一事实,未考虑网络设备的不同计算能力以及通信时间与本地计算开销之间的不同关系。本文对此场景展开研究,旨在针对任意通信与本地计算成本,实现服务器与本地机器间分布式数据的最优比例。我们比较了均匀分布与最优分布下的网络运行时间,并通过实验验证了所提方案在理论性能上的优越性。