We develop a matrix-free Full Approximation Storage (FAS) multigrid solver based on staggered finite differences and implemented on GPU in MATLAB. To enhance performance, intermediate variables are reused, and an X-shape Multi-Color Gauss-Seidel (X-MCGS) smoother is introduced, which eliminates conditional branching by partitioning the grid into four submatrices. Restriction and prolongation operators are also GPU-accelerated. Convergence tests verify robustness and accuracy, while benchmarks show substantial speedups: for the 2D heat equation on an $8192^2$ grid, the RTX~4090 achieves $61\times$ over a single-core CPU, and in 3D at $512^3$, $46\times$. A memory-efficient implementation of first- and second-order projection schemes reduces GPU-resident variables from 12/15 to 8, lowering memory footprint and improving performance by 20--30%, enabling $512^3$ Navier-Stokes simulations on a single GPU. Grain growth on a $512^2$ grid accommodates up to $q=1189$ (2D) and $q=123$ (3D) orientations, reproducing expected scaling laws. Coupled with Cahn-Hilliard equations, air-water two-bubble coalescence is simulated on a $256\times 256\times 1024$ grid, agreeing with experimental observations.
翻译:本文开发了一种基于交错有限差分的无矩阵全近似存储(FAS)多重网格求解器,并在MATLAB中实现了GPU加速。为提升性能,我们复用了中间变量,并引入了X形多色高斯-赛德尔(X-MCGS)光滑子,该方法通过将网格划分为四个子矩阵消除了条件分支。限制算子与延拓算子同样进行了GPU加速。收敛性测试验证了算法的鲁棒性与精度,性能基准测试显示出显著的加速效果:对于二维热传导方程在$8192^2$网格上的计算,RTX~4090显卡相比单核CPU实现了$61\times$的加速;在三维$512^3$网格上实现了$46\times$的加速。针对一阶与二阶投影格式的内存高效实现将GPU驻留变量从12/15个减少至8个,降低了内存占用并使性能提升20–30%,从而实现了在单块GPU上进行$512^3$网格规模的Navier-Stokes方程模拟。在$512^2$网格上的晶粒生长模拟可支持高达$q=1189$(二维)和$q=123$(三维)的取向数,再现了预期的标度律。结合Cahn-Hilliard方程,在$256\times 256\times 1024$网格上模拟了空气-水双气泡聚并过程,结果与实验观测相符。