We present a data structure to randomly sample rows from the Khatri-Rao product of several matrices according to the exact distribution of its leverage scores. Our proposed sampler draws each row in time logarithmic in the height of the Khatri-Rao product and quadratic in its column count, with persistent space overhead at most the size of the input matrices. As a result, it tractably draws samples even when the matrices forming the Khatri-Rao product have tens of millions of rows each. When used to sketch the linear least squares problems arising in CANDECOMP / PARAFAC tensor decomposition, our method achieves lower asymptotic complexity per solve than recent state-of-the-art methods. Experiments on billion-scale sparse tensors validate our claims, with our algorithm achieving higher accuracy than competing methods as the decomposition rank grows.
翻译:我们提出一种数据结构,用于根据多个矩阵的Khatri-Rao乘积的精确杠杆率分数分布对其行进行随机采样。所提出的采样器以与Khatri-Rao乘积高度呈对数关系、与其列数呈二次关系的时间复杂度抽取每一行,且持久化空间开销不超过输入矩阵的总大小。因此,即使组成Khatri-Rao乘积的矩阵各自拥有数千万行,该采样器也能以可处理的方式进行采样。当用于CANDECOMP/PARAFAC张量分解中的线性最小二乘问题近似求解时,我们的方法在单次求解中实现了比近期最先进方法更低的新近复杂度。在十亿级稀疏张量上的实验验证了我们的论断:随着分解秩的增长,我们的算法相比竞争方法达到了更高精度。