Contemporary accelerator designs exhibit a high degree of spatial localization, wherein two-dimensional physical distance determines communication costs between processing elements. This situation presents considerable algorithmic challenges, particularly when managing sparse data, a pivotal component in progressing data science. The spatial computer model quantifies communication locality by weighting processor communication costs by distance, introducing a term named energy. Moreover, it integrates depth, a widely-utilized metric, to promote high parallelism. We propose and analyze a framework for efficient spatial tree algorithms within the spatial computer model. Our primary method constructs a spatial tree layout that optimizes the locality of the neighbors in the compute grid. This approach thereby enables locality-optimized messaging within the tree. Our layout achieves a polynomial factor improvement in energy compared to utilizing a PRAM approach. Using this layout, we develop energy-efficient treefix sum and lowest common ancestor algorithms, which are both fundamental building blocks for other graph algorithms. With high probability, our algorithms exhibit near-linear energy and poly-logarithmic depth. Our contributions augment a growing body of work demonstrating that computations can have both high spatial locality and low depth. Moreover, our work constitutes an advancement in the spatial layout of irregular and sparse computations.
翻译:当代加速器设计展现出高度的空间局部性特征,其中二维物理距离决定了处理单元间的通信开销。这种架构带来了显著的算法挑战,特别是在处理稀疏数据时——这是推动数据科学发展的关键要素。空间计算模型通过以距离加权处理器通信成本来量化通信局部性,并引入名为“能量”的度量项。此外,该模型还整合了广泛使用的“深度”度量以促进高度并行化。我们提出并分析了一个在空间计算模型下高效空间树算法的框架。我们的核心方法构建了一种空间树布局,可优化计算网格中相邻节点的局部性。该策略由此实现了树结构内局部性优化的消息传递。相较于采用PRAM方法,我们的布局在能量消耗上实现了多项式级别的改进。基于此布局,我们开发了高能效的树前缀和与最近公共祖先算法,这两者均是其他图算法的基本构建模块。在大概率条件下,我们的算法展现出近线性的能量消耗和多对数级别的深度。我们的研究成果补充了日益增多的证据,表明计算过程可以同时具备高空间局部性与低深度特性。此外,本工作推动了非规则与稀疏计算在空间布局方面的研究进展。