Many real-world data are naturally represented as a sparse reorderable matrix, whose rows and columns can be arbitrarily ordered (e.g., the adjacency matrix of a bipartite graph). Storing a sparse matrix in conventional ways requires an amount of space linear in the number of non-zeros, and lossy compression of sparse matrices (e.g., Truncated SVD) typically requires an amount of space linear in the number of rows and columns. In this work, we propose NeuKron for compressing a sparse reorderable matrix into a constant-size space. NeuKron generalizes Kronecker products using a recurrent neural network with a constant number of parameters. NeuKron updates the parameters so that a given matrix is approximated by the product and reorders the rows and columns of the matrix to facilitate the approximation. The updates take time linear in the number of non-zeros in the input matrix, and the approximation of each entry can be retrieved in logarithmic time. We also extend NeuKron to compress sparse reorderable tensors (e.g. multi-layer graphs), which generalize matrices. Through experiments on ten real-world datasets, we show that NeuKron is (a) Compact: requiring up to five orders of magnitude less space than its best competitor with similar approximation errors, (b) Accurate: giving up to 10x smaller approximation error than its best competitors with similar size outputs, and (c) Scalable: successfully compressing a matrix with over 230 million non-zero entries.
翻译:许多现实世界数据天然表示为稀疏可重排矩阵,其行和列可任意排序(例如二分图的邻接矩阵)。传统方式存储稀疏矩阵所需空间与非零元数量呈线性关系,而稀疏矩阵的有损压缩(如截断SVD)通常需要与行数和列数呈线性关系的空间。本文提出NeuKron方法,将稀疏可重排矩阵压缩为恒定大小的空间。NeuKron使用具有恒定参数数量的循环神经网络泛化Kronecker积,通过更新参数使得给定矩阵被该积逼近,并重排矩阵行列以促进逼近过程。参数更新复杂度与输入矩阵的非零元数量呈线性关系,每个条目的近似值可在对数时间内检索。我们还将NeuKron扩展至压缩稀疏可重排张量(如多层图),这类结构是矩阵的泛化形式。在十个真实数据集上的实验表明,NeuKron具备:(a) 紧凑性:在相近逼近误差下,所需空间比最优对比方法低至五个数量级;(b) 精确性:在同等输出规模下,逼近误差比最优对比方法低至10倍;(c) 可扩展性:成功压缩包含超过2.3亿非零元的矩阵。