The layout of multi-dimensional data can have a significant impact on the efficacy of hardware caches and, by extension, the performance of applications. Common multi-dimensional layouts include the canonical row-major and column-major layouts as well as the Morton curve layout. In this paper, we describe how the Morton layout can be generalized to a very large family of multi-dimensional data layouts with widely varying performance characteristics. We posit that this design space can be efficiently explored using a combinatorial evolutionary methodology based on genetic algorithms. To this end, we propose a chromosomal representation for such layouts as well as a methodology for estimating the fitness of array layouts using cache simulation. We show that our fitness function correlates to kernel running time in real hardware, and that our evolutionary strategy allows us to find candidates with favorable simulated cache properties in four out of the eight real-world applications under consideration in a small number of generations. Finally, we demonstrate that the array layouts found using our evolutionary method perform well not only in simulated environments but that they can effect significant performance gains -- up to a factor ten in extreme cases -- in real hardware.
翻译:多维数据的布局对硬件缓存的有效性以及应用程序的性能有显著影响。常见的多维布局包括标准的行主序和列主序布局,以及Morton曲线布局。本文描述了如何将Morton布局推广到一个包含大量性能特征各异的多维数据布局的庞大族系。我们假设这一设计空间可以通过基于遗传算法的组合进化方法进行高效探索。为此,我们提出了一种针对此类布局的染色体表示方法,以及一种利用缓存模拟来评估数组布局适应度的方法。我们证明了适应度函数与实际硬件上的内核运行时间相关,并且我们的进化策略能够在少量代数内,从所考虑的八个实际应用中,在四个应用中筛选出具有良好模拟缓存特性的候选布局。最后,我们论证了通过进化方法找到的数组布局不仅在模拟环境中表现优异,而且能在实际硬件上实现显著的性能提升——在极端情况下性能提升可达十倍。