Sketching is widely used in randomized linear algebra for low-rank matrix approximation, column subset selection, and many other problems, and it has gained significant traction in machine learning applications. However, sketching large matrices often necessitates distributed memory algorithms, where communication overhead becomes a critical bottleneck on modern supercomputing clusters. Despite its growing relevance, distributed-memory parallel strategies for sketching remain largely unexplored. In this work, we establish communication lower bounds for sketching using dense matrices that determine how much data movement is required to perform it in parallel. One important observation of our lower bounds is that no communication is required for a small number of processors. We show that our lower bounds are tight by presenting communication optimal algorithms. Furthermore, we extend our approach to determine communication lower bounds for computations of Nyström approximation where sketching is applied twice. We also introduce novel parallel algorithms whose communication costs are close to the lower bounds. Finally, we implement our algorithms on modern state-of-the-art supercomputing infrastructures which have both CPU- and GPU-equipped systems and demonstrate their parallel scalability.
翻译:草图(Sketching)广泛应用于随机线性代数中的低秩矩阵近似、列子集选择等多个问题,并在机器学习应用中获得了显著进展。然而,对大型矩阵进行草图通常需要分布式内存算法,此时通信开销成为现代超级计算集群中的关键瓶颈。尽管其重要性日益凸显,针对草图的分布式内存并行策略仍鲜有探索。本文针对使用稠密矩阵进行草图的问题,建立了通信下界,确定了在并行计算中所需的数据移动量。一个重要的观察结果是,对于少量处理器,无需通信。我们通过提出通信最优算法证明了该下界的紧致性。此外,我们将方法扩展至Nyström近似计算中需要两次草图的场景,建立了通信下界。我们同时引入了通信成本接近下界的新型并行算法。最终,我们在配备CPU和GPU的现代最先进超级计算基础设施上实现了所提算法,并展示了其并行可扩展性。