Modern computer systems are characterized by deep memory hierarchies, composed of main memory, multiple layers of cache, and other specialized types of memory. In parallel and distributed systems, additional memory layers are added to this hierarchy. Achieving good performance for computational science applications, in terms of execution time, depends on the efficient use of this diverse and hierarchical memory. This paper revisits the use of space-filling curves to specify the ordering in memory of data structures used in representative scientific applications executing on parallel machines containing clusters of multicore CPUs with attached GPUs. This work examines the hypothesis that space-filling curves, such as Hilbert and Morton ordering, can improve data locality and hence result in more efficient data movement than row or column-based orderings. First, performance results are presented that show for what application parameterizations and machine characteristics this is the case, and are interpreted in terms of how an application interacts with the computer hardware and low-level software. This research particularly focuses on the use of stencil-based applications that form the basis of many scientific computations. Second, how space-filling curves impact data sharing in nearest-neighbour and stencil-based codes is considered.
翻译:现代计算机系统以深度内存层次结构为特征,包含主存、多层缓存及其他专用内存类型。在并行与分布式系统中,该层次结构进一步扩展。计算科学应用在执行时间方面获得良好性能,取决于对这一多样化分层内存的高效利用。本文重新审视了空间填充曲线在指定数据结构的存储顺序中的应用——这些数据结构用于包含多核CPU集群及附加GPU的并行机器上运行的代表性科学应用。本研究检验了一个假设:希尔伯特曲线、莫顿曲线等空间填充曲线能提升数据局部性,从而比基于行或列的排序方式实现更高效的数据移动。首先,本文呈现了在何种应用参数化条件和机器特性下该假设成立的性能结果,并结合应用与计算机硬件及底层软件的交互机制进行了解释。研究尤其聚焦于构成众多科学计算基础的模板计算类应用。其次,本文考虑了空间填充曲线如何影响近邻交互型与模板计算型代码中的数据共享。