Approximate matrix multiplication with limited space has received ever-increasing attention due to the emergence of large-scale applications. Recently, based on a popular matrix sketching algorithm -- frequent directions, previous work has introduced co-occuring directions (COD) to reduce the approximation error for this problem. Although it enjoys the space complexity of $O((m_x+m_y)\ell)$ for two input matrices $X\in\mathbb{R}^{m_x\times n}$ and $Y\in\mathbb{R}^{m_y\times n}$ where $\ell$ is the sketch size, its time complexity is $O\left(n(m_x+m_y+\ell)\ell\right)$, which is still very high for large input matrices. In this paper, we propose to reduce the time complexity by exploiting the sparsity of the input matrices. The key idea is to employ an approximate singular value decomposition (SVD) method which can utilize the sparsity, to reduce the number of QR decompositions required by COD. In this way, we develop sparse co-occuring directions, which reduces the time complexity to $\widetilde{O}\left((\nnz(X)+\nnz(Y))\ell+n\ell^2\right)$ in expectation while keeps the same space complexity as $O((m_x+m_y)\ell)$, where $\nnz(X)$ denotes the number of non-zero entries in $X$ and the $\widetilde{O}$ notation hides constant factors as well as polylogarithmic factors. Theoretical analysis reveals that the approximation error of our algorithm is almost the same as that of COD. Furthermore, we empirically verify the efficiency and effectiveness of our algorithm.
翻译:随着大规模应用的出现,有限空间下的近似矩阵乘法受到日益增长的关注。最近,基于一种流行的矩阵素描算法——频繁方向,先前的研究引入了共现方向(COD)以降低该问题的近似误差。尽管对于两个输入矩阵 $X\in\mathbb{R}^{m_x\times n}$ 和 $Y\in\mathbb{R}^{m_y\times n}$(其中 $\ell$ 为素描大小),该算法享有 $O((m_x+m_y)\ell)$ 的空间复杂度,但其时间复杂度为 $O\left(n(m_x+m_y+\ell)\ell\right)$,对于大型输入矩阵而言仍然非常高。本文提出通过利用输入矩阵的稀疏性来降低时间复杂度。核心思想是采用一种能够利用稀疏性的近似奇异值分解(SVD)方法,以减少 COD 所需的 QR 分解次数。通过这种方式,我们开发了稀疏共现方向算法,在保持与 $O((m_x+m_y)\ell)$ 相同空间复杂度的同时,将时间复杂度降低至期望的 $\widetilde{O}\left((\nnz(X)+\nnz(Y))\ell+n\ell^2\right)$,其中 $\nnz(X)$ 表示 $X$ 中非零元素的数量,$\widetilde{O}$ 记号隐藏了常数因子及多对数因子。理论分析表明,我们算法的近似误差与 COD 几乎相同。此外,我们通过实验验证了算法的效率和有效性。