We present a technical enhancement within the p4est software for parallel adaptive mesh refinement. In p4est primitives are stored as octants in three and quadrants in two dimensions. While, classically, they are encoded by the native approach using its spatial and refinement level, any other mathematically equivalent encoding might be used instead. Recognizing this, we add two alternative representations to the classical, explicit version, based on a long monotonic index and 128-bit AVX quad integers, respectively. The first one requires changes in logic for low-level quadrant manipulating algorithms, while the other exploits data level parallelism and requires algorithms to be adapted to SIMD instructions. The resultant algorithms and data structures lead to higher performance and lesser memory usage in comparison with the standard baseline. We benchmark selected algorithms on a cluster with two Intel(R) Xeon(R) Gold 6130 Skylake family CPUs per node, which provides support for AVX2 extensions, 192 GB RAM per node, and up to 512 computational cores in total.
翻译:本文介绍了p4est软件中并行自适应网格细化的技术改进。p4est中,基本元素在三维空间以八分体、在二维空间以象限形式存储。传统上采用基于空间位置和细化层级的原生编码方式,但任何数学等价的编码方式均可替代使用。基于此认识,我们在经典显式表示基础上新增了两种备选表示方法:基于长单调索引的表示方法和基于128位AVX四整数向量的表示方法。前者需要修改底层象限操作算法的逻辑结构,后者则利用数据级并行性,要求算法适配SIMD指令集。与标准基准相比,由此生成的算法和数据结构可实现更高性能并降低内存占用。我们在配备双Intel(R) Xeon(R) Gold 6130 Skylake家族CPU(支持AVX2扩展)、每节点192GB内存、总计512个计算核心的集群上对选定算法进行了基准测试。