Distributed-Memory Randomized Algorithms for Sparse Tensor CP Decomposition

from arxiv, To appear in the Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'24). 14 pages, 13 figures, 5 tables

Candecomp / PARAFAC (CP) decomposition, a generalization of the matrix singular value decomposition to higher-dimensional tensors, is a popular tool for analyzing multidimensional sparse data. On tensors with billions of nonzero entries, computing a CP decomposition is a computationally intensive task. We propose the first distributed-memory implementations of two randomized CP decomposition algorithms, CP-ARLS-LEV and STS-CP, that offer nearly an order-of-magnitude speedup at high decomposition ranks over well-tuned non-randomized decomposition packages. Both algorithms rely on leverage score sampling and enjoy strong theoretical guarantees, each with varying time and accuracy tradeoffs. We tailor the communication schedule for our random sampling algorithms, eliminating expensive reduction collectives and forcing communication costs to scale with the random sample count. Finally, we optimize the local storage format for our methods, switching between analogues of compressed sparse column and compressed sparse row formats. Experiments show that our methods are fast and scalable, producing 11x speedup over SPLATT by decomposing the billion-scale Reddit tensor on 512 CPU cores in under two minutes.

翻译：Candecomp/PARAFAC（CP）分解是矩阵奇异值分解在高维张量上的推广，是分析多维稀疏数据的常用工具。对于包含数十亿非零元素的张量，计算CP分解是一项计算密集型的任务。我们提出了两种随机CP分解算法——CP-ARLS-LEV和STS-CP的首个分布式内存实现，这些算法在高分解秩下相比经过良好调优的非随机分解包可实现近一个数量级的加速。两种算法均依赖杠杆分数采样，并享有强理论保证，在时间和精度权衡上各有差异。我们针对随机采样算法定制了通信调度，消除了昂贵的归约集合操作，使通信成本随随机样本数量扩展。最后，我们优化了方法的本地存储格式，在压缩稀疏列格式和压缩稀疏行格式的类似形式之间切换。实验表明，我们的方法快速且可扩展，在512个CPU核心上分解十亿规模的Reddit张量仅需不到两分钟，相比SPLATT实现了11倍加速。

相关内容

关注 1

这是第25届年度会议，讨论有约束计算的所有方面，包括理论、算法、环境、语言、模型、系统和应用，如决策、资源分配、调度、配置和规划。为了纪念25周年，吉恩·弗洛伊德创作了一本“虚拟卷”来庆祝这个系列会议。信息可以在这里找到。约束编程协会有本系列中以前的会议列表。CP 2019计划将包括展示关于约束技术的高质量科学论文。除了通常的技术轨道外，CP 2019年会议还将有主题轨道。每个赛道都有一个专门的小组委员会，以确保有能力的评审员将审查这些领域的人提交的论文。官网链接：https://cp2019.a4cp.org/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日