Micromagnetic simulations are essential tools in nanomagnetism and spintronics research. Although widely adopted solvers like Mumax3 and the Python-native magnum.np use GPU acceleration to improve performance, these tools are limited to single-device computation. In this work, we present the first Python-native multi-GPU micromagnetic framework by extending magnum.np with PyTorch Distributed. This leverages high-speed communication and computation across multiple GPUs while retaining the benefits of ease of installation, platform-agnostic design, and compatibility with Python. For computationally intensive demagnetisation effective-field calculations, we achieve a 7.0x speedup across 8 GPUs connected via NVLink, whereas Halo exchange required for Heisenberg exchange shows limited scaling due to kernel dispatch latency. We also demonstrated the framework's versatility by achieving a 6.8x speedup in demagnetisation field computation on CPU with NUMA pinning via the MPI backend of PyTorch Distributed. Faster turnaround times will enable researchers to explore larger, more complex systems and accelerate the design cycle for novel spintronic devices.
翻译:微磁学模拟是纳米磁学和自旋电子学研究中的关键工具。尽管广泛采用的求解器(如Mumax3和Python原生框架magnum.np)通过GPU加速提升了性能,但这些工具仅限于单设备计算。本研究提出首个Python原生多GPU微磁学框架,通过将magnum.np与PyTorch Distributed扩展,在保持安装简便、平台无关设计和Python兼容性优势的同时,实现了多GPU间的高速通信与计算。对于计算密集型的退磁有效场计算,我们在通过NVLink连接的8个GPU上获得了7.0倍加速比,而海森堡交换所需的Halo交换因内核调度延迟导致扩展性受限。此外,通过PyTorch Distributed的MPI后端结合NUMA绑定,我们在CPU上实现了退磁场计算6.8倍的加速,验证了框架的通用性。更快的运算周期将使研究人员能够探索更大规模、更复杂的系统,加速新型自旋电子器件的设计进程。