The layout of multi-dimensional data can have a significant impact on the efficacy of hardware caches and, by extension, the performance of applications. Common multi-dimensional layouts include the canonical row-major and column-major layouts as well as the Morton curve layout. In this paper, we describe how the Morton layout can be generalized to a very large family of multi-dimensional data layouts with widely varying performance characteristics. We posit that this design space can be efficiently explored using a combinatorial evolutionary methodology based on genetic algorithms. To this end, we propose a chromosomal representation for such layouts as well as a methodology for estimating the fitness of array layouts using cache simulation. We show that our fitness function correlates to kernel running time in real hardware, and that our evolutionary strategy allows us to find candidates with favorable simulated cache properties in four out of the eight real-world applications under consideration in a small number of generations. Finally, we demonstrate that the array layouts found using our evolutionary method perform well not only in simulated environments but that they can effect significant performance gains -- up to a factor ten in extreme cases -- in real hardware.
翻译:多维数据的布局对硬件缓存的有效性以及应用程序的性能有着重要影响。常见的多维布局包括规范的行主序布局、列主序布局以及莫顿曲线布局。本文描述了如何将莫顿布局推广到一个具有广泛性能差异的庞大多维数据布局族。我们认为,基于遗传算法的组合进化方法论可以有效探索这一设计空间。为此,我们提出了此类布局的染色体表示方法,以及通过缓存模拟评估数组布局适应度的技术。实验表明,我们的适应度函数与真实硬件上的内核运行时间具有相关性,并且进化策略能够在八种实际应用中的四种里,通过少量迭代找到具有优越模拟缓存特性的候选布局。最后,我们证明,通过进化方法找到的数组布局不仅在模拟环境中表现良好,而且能在真实硬件上实现显著的性能提升——在极端情况下可达十倍以上。