Many areas of science and engineering encounter data defined on spherical manifolds. Modelling and analysis of spherical data often necessitates spherical harmonic transforms, at high degrees, and increasingly requires efficient computation of gradients for machine learning or other differentiable programming tasks. We develop novel algorithmic structures for accelerated and differentiable computation of generalised Fourier transforms on the sphere $\mathbb{S}^2$ and rotation group $\text{SO}(3)$, i.e. spherical harmonic and Wigner transforms, respectively. We present a recursive algorithm for the calculation of Wigner $d$-functions that is both stable to high harmonic degrees and extremely parallelisable. By tightly coupling this with separable spherical transforms, we obtain algorithms that exhibit an extremely parallelisable structure that is well-suited for the high throughput computing of modern hardware accelerators (e.g. GPUs). We also develop a hybrid automatic and manual differentiation approach so that gradients can be computed efficiently. Our algorithms are implemented within the JAX differentiable programming framework in the S2FFT software code. Numerous samplings of the sphere are supported, including equiangular and HEALPix sampling. Computational errors are at the order of machine precision for spherical samplings that admit a sampling theorem. When benchmarked against alternative C codes we observe up to a 400-fold acceleration. Furthermore, when distributing over multiple GPUs we achieve very close to optimal linear scaling with increasing number of GPUs due to the highly parallelised and balanced nature of our algorithms. Provided access to sufficiently many GPUs our transforms thus exhibit an unprecedented effective linear time complexity.
翻译:科学与工程中的众多领域常遇到定义在球面流形上的数据。对球面数据进行建模与分析通常需要高阶球谐变换,并且日益要求为机器学习或其他可微分编程任务高效计算梯度。我们开发了新颖的算法结构,用于在球面 $\mathbb{S}^2$ 和旋转群 $\text{SO}(3)$ 上实现广义傅里叶变换(即球谐变换和维格纳变换)的加速与可微分计算。我们提出了一种递归算法用于计算维格纳 $d$ 函数,该算法在高谐波阶数下保持稳定且具有极高的并行性。通过将其与可分离球面变换紧密耦合,我们获得了具有极高并行结构的算法,非常适合现代硬件加速器(如 GPU)的高吞吐量计算。我们还开发了一种混合自动与手动微分方法,从而能高效计算梯度。我们的算法在 S2FFT 软件代码中基于 JAX 可微分编程框架实现。支持多种球面采样方式,包括等角采样和 HEALPix 采样。对于满足采样定理的球面采样,计算误差达到机器精度级别。与替代的 C 代码进行基准测试相比,我们观察到高达 400 倍的加速。此外,在多 GPU 上进行分布式计算时,由于算法具有高度并行化和均衡的特性,随 GPU 数量增加我们实现了非常接近线性的最优扩展。因此,在拥有足够多 GPU 的条件下,我们的变换展现出前所未有的有效线性时间复杂度。