Exact diagonalization is a well-established method for simulating small quantum systems. Its applicability is limited by the exponential growth of the so-called Hamiltonian matrix that needs to be diagonalized. Physical symmetries are usually utilized to reduce the matrix dimension, and distributed-memory parallelism is employed to explore larger systems. This paper focuses on the implementation the core distributed algorithms, with a special emphasis on the matrix-vector product operation. Instead of the conventional MPI+X paradigm, Chapel is chosen as the language for these distributed algorithms. We provide a comprehensive description of the algorithms and present performance and scalability tests. Our implementation outperforms the state-of-the-art MPI-based solution by a factor of 7--8 on 32 compute nodes or 4096 cores and exhibits very good scaling on up to 256 nodes or 32768 cores. The implementation has 3 times fewer software lines of code than the current state of the art while remaining fully generic.
翻译:精确对角化是模拟小型量子系统的成熟方法。其适用性受限于需要对角化的所谓哈密顿矩阵的指数级增长。通常利用物理对称性降低矩阵维度,并采用分布式内存并行性探索更大规模系统。本文聚焦于核心分布式算法的实现,特别强调矩阵-向量积运算。未采用传统的MPI+X范式,而是选择Chapel语言实现这些分布式算法。我们提供算法的全面描述,并展示性能与可扩展性测试结果。我们的实现在32个计算节点(4096核)上比当前最先进的基于MPI的解决方案快7-8倍,且在高达256个节点(32768核)上展现出极佳的可扩展性。该实现的代码行数比当前最先进方案少3倍,同时保持完全通用性。