Industrial scientific computing predominantly uses sparse matrices to represent unstructured data -- finite element meshes, graphs, point clouds. We present \torchsla{}, an open-source PyTorch library that enables GPU-accelerated, scalable, and differentiable sparse linear algebra. The library addresses three fundamental challenges: (1) GPU acceleration for sparse linear solves, nonlinear solves (Newton, Picard, Anderson), and eigenvalue computation; (2) Multi-GPU scaling via domain decomposition with halo exchange, reaching \textbf{400 million DOF linear solve on 3 GPUs}; and (3) Adjoint-based differentiation} achieving $\mathcal{O}(1)$ computational graph nodes (for autograd) and $\mathcal{O}(\text{nnz})$ memory -- independent of solver iterations. \torchsla{} supports multiple backends (SciPy, cuDSS, PyTorch-native) and seamlessly integrates with PyTorch autograd for end-to-end differentiable simulations. Code is available at https://github.com/walkerchi/torch-sla.
翻译:工业科学计算主要使用稀疏矩阵表示非结构化数据——有限元网格、图结构、点云。本文介绍\torchsla{}:一个开源PyTorch库,支持GPU加速、可扩展且可微分的稀疏线性代数运算。该库解决了三个核心挑战:(1) 为稀疏线性求解、非线性求解(牛顿法、皮卡德法、安德森法)及特征值计算提供GPU加速;(2) 通过带边界交换的域分解实现多GPU扩展,在3块GPU上达成\textbf{4亿自由度线性求解};(3) 基于伴随法的微分技术实现$\mathcal{O}(1)$计算图节点(用于自动微分)与$\mathcal{O}(\text{nnz})$内存占用——其复杂度与求解器迭代次数无关。\torchsla{}支持多后端(SciPy、cuDSS、PyTorch原生)并与PyTorch自动微分系统无缝集成,可实现端到端可微模拟。代码发布于https://github.com/walkerchi/torch-sla。