The Tucker tensor decomposition is a natural extension of the singular value decomposition (SVD) to multiway data. We propose to accelerate Tucker tensor decomposition algorithms by using randomization and parallelization. We present two algorithms that scale to large data and many processors, significantly reduce both computation and communication cost compared to previous deterministic and randomized approaches, and obtain nearly the same approximation errors. The key idea in our algorithms is to perform randomized sketches with Kronecker-structured random matrices, which reduces computation compared to unstructured matrices and can be implemented using a fundamental tensor computational kernel. We provide probabilistic error analysis of our algorithms and implement a new parallel algorithm for the structured randomized sketch. Our experimental results demonstrate that our combination of randomization and parallelization achieves accurate Tucker decompositions much faster than alternative approaches. We observe up to a 16X speedup over the fastest deterministic parallel implementation on 3D simulation data.
翻译:Tucker张量分解是奇异值分解(SVD)在多维数据上的自然推广。我们提出通过随机化和并行化来加速Tucker张量分解算法,并给出了两种能够扩展至大规模数据和众多处理器的算法。相较于以往的确定性和随机化方法,这些算法显著降低了计算与通信开销,同时获得了近乎相同的近似误差。我们算法的核心思路是采用具有Kronecker结构的随机矩阵执行随机化草图计算,相较于非结构化矩阵,该方法减少了计算量,并可通过基本张量计算内核实现。我们提供了算法的概率误差分析,并为结构化随机化草图实现了新的并行算法。实验结果表明,我们融合随机化与并行化的方案在实现精确Tucker分解时,速度远优于其他方法。在三维仿真数据上,我们观察到相较于最快的确定性并行实现,加速比可达16倍。