Sparse matrix representations are ubiquitous in computational science and machine learning, leading to significant reductions in compute time, in comparison to dense representation, for problems that have local connectivity. The adoption of sparse representation in leading ML frameworks such as PyTorch is incomplete, however, with support for both automatic differentiation and GPU acceleration missing. In this work, we present an implementation of a CSR-based sparse matrix wrapper for PyTorch with CUDA acceleration for basic matrix operations, as well as automatic differentiability. We also present several applications of the resulting sparse kernels to optimization problems, demonstrating ease of implementation and performance measurements versus their dense counterparts.
翻译:稀疏矩阵表示在计算科学和机器学习中无处不在,对于具有局部连通性的问题,与稠密表示相比,能显著降低计算时间。然而,主流机器学习框架(如PyTorch)对稀疏表示的支持尚不完善,缺失对自动微分和GPU加速的支持。本文提出了一种基于CSR的PyTorch稀疏矩阵封装实现,该实现支持基本矩阵运算的CUDA加速以及自动可微性。我们还展示了所生成的稀疏核在优化问题中的若干应用,并通过与稠密实现的对比,证明了其易于实现性和性能优势。