Sparse matrix representations are ubiquitous in computational science and machine learning, leading to significant reductions in compute time, in comparison to dense representation, for problems that have local connectivity. The adoption of sparse representation in leading ML frameworks such as PyTorch is incomplete, however, with support for both automatic differentiation and GPU acceleration missing. In this work, we present an implementation of a CSR-based sparse matrix wrapper for PyTorch with CUDA acceleration for basic matrix operations, as well as automatic differentiability. We also present several applications of the resulting sparse kernels to optimization problems, demonstrating ease of implementation and performance measurements versus their dense counterparts.
翻译:稀疏矩阵表示在计算科学和机器学习中普遍存在,相较于稠密表示,对于具有局部连接性的问题可显著降低计算时间。然而,主流机器学习框架(如PyTorch)对稀疏表示的支持尚不完善,缺乏对自动微分和GPU加速的双重支持。本研究提出一种基于CSR的稀疏矩阵封装器实现,该封装器支持CUDA加速的基本矩阵运算及自动可微性。我们还展示了所生成的稀疏核在优化问题中的若干应用,验证了其实现便捷性,并与稠密对应方案进行了性能对比。