Distributed high dimensional mean estimation is a common aggregation routine used often in distributed optimization methods. Most of these applications call for a communication-constrained setting where vectors, whose mean is to be estimated, have to be compressed before sharing. One could independently encode and decode these to achieve compression, but that overlooks the fact that these vectors are often close to each other. To exploit these similarities, recently Suresh et al., 2022, Jhunjhunwala et al., 2021, Jiang et al, 2023, proposed multiple correlation-aware compression schemes. However, in most cases, the correlations have to be known for these schemes to work. Moreover, a theoretical analysis of graceful degradation of these correlation-aware compression schemes with increasing dissimilarity is limited to only the $\ell_2$-error in the literature. In this paper, we propose four different collaborative compression schemes that agnostically exploit the similarities among vectors in a distributed setting. Our schemes are all simple to implement and computationally efficient, while resulting in big savings in communication. The analysis of our proposed schemes show how the $\ell_2$, $\ell_\infty$ and cosine estimation error varies with the degree of similarity among vectors.
翻译:分布式高维均值估计是分布式优化方法中常用的聚合例程。大多数此类应用需要在通信受限的环境下进行,其中待估计均值的向量在共享前必须经过压缩处理。虽然可以通过独立编码和解码实现压缩,但这种方法忽略了这些向量通常彼此接近的事实。为利用这种相似性,Suresh等人(2022)、Jhunjhunwala等人(2021)、Jiang等人(2023)近期提出了多种相关性感知压缩方案。然而,在多数情况下,这些方案需要预先获知相关性信息。此外,现有文献中关于这些相关性感知压缩方案随差异性增大而性能优雅下降的理论分析仅限于$\ell_2$误差。本文提出四种不同的协同压缩方案,能够在分布式环境中无需先验知识地利用向量间的相似性。所有方案均易于实现且计算高效,同时能显著节省通信开销。我们通过理论分析展示了$\ell_2$误差、$\ell_\infty$误差及余弦估计误差如何随向量间相似度变化。