Wavelet-based grid adaptation driven by the "multiresolution analysis" (MRA) of the Haar wavelet (HW) allows to devise an adaptive first-order finite volume (FV1) model (HWFV1) that can readily preserve the modelling fidelity of its reference uniform-grid FV1 counterpart. However, the MRA incurs a high computational cost as it involves "encoding" (coarsening), "decoding" (refining), analysing and traversing modelled data across a deep hierarchy of nested, uniform grids. GPU-parallelisation of the MRA is needed to reduce its computational cost, but its algorithmic structure (1) hinders coalesced memory access on the GPU, and (2) involves an inherently sequential tree traversal problem. This work redesigns the algorithmic structure of the MRA in order to parallelise it on the GPU, addressing (1) by applying Z-order space-filling curves and addressing (2) by adopting a parallel tree traversal algorithm. This results in a GPU-parallelised HWFV1 model (GPU-HWFV1). GPU-HWFV1 is verified against its CPU predecessor (CPU-HWFV1) and its GPU-parallelised reference uniform-grid counterpart (GPU-FV1) over five shallow water flow test cases. GPU-HWFV1 preserves the modelling fidelity of GPU-FV1 while being up to 30 times faster. Compared to CPU-HWFV1, it is up to 200 times faster, suggesting the GPU-parallelised MRA could be used to speed up other FV1 models.
翻译:由Haar小波(HW)的“多分辨率分析”(MRA)驱动的小波网格自适应方法,能够设计出一种自适应一阶有限体积(FV1)模型(HWFV1),该模型可有效保持其参考均匀网格FV1模型的建模保真度。然而,MRA涉及对嵌套均匀网格深层层次结构中建模数据的“编码”(粗化)、“解码”(细化)、分析与遍历,导致计算成本高昂。为降低计算开销,需对MRA进行GPU并行化,但其算法结构存在两大难点:(1)阻碍GPU上的合并内存访问;(2)涉及固有串行性的树遍历问题。本研究重新设计了MRA的算法结构以实现GPU并行化:针对问题(1)采用Z阶空间填充曲线,针对问题(2)采用并行树遍历算法。由此构建的GPU并行化HWFV1模型(GPU-HWFV1)在五个浅水流测试案例中,与CPU版本(CPU-HWFV1)及GPU并行化参考均匀网格模型(GPU-FV1)进行了验证。结果表明:GPU-HWFV1在保持GPU-FV1建模保真度的同时,计算速度最高提升30倍;相较于CPU-HWFV1,速度提升最高达200倍,表明GPU并行化MRA可用于加速其他FV1模型。