From FORTRAN to NumPy, arrays have revolutionized how we express computation. However, arrays in these, and almost all prominent systems, can only handle dense rectilinear integer grids. Real world arrays often contain underlying structure, such as sparsity, runs of repeated values, or symmetry. Support for structured data is fragmented and incomplete. Existing frameworks limit the array structures and program control flow they support to better simplify the problem. In this work, we propose a new programming language, Finch, which supports both flexible control flow and diverse data structures. Finch facilitates a programming model which resolves the challenges of computing over structured arrays by combining control flow and data structures into a common representation where they can be co-optimized. Finch automatically specializes control flow to data so that performance engineers can focus on experimenting with many algorithms. Finch supports a familiar programming language of loops, statements, ifs, breaks, etc., over a wide variety of array structures, such as sparsity, run-length-encoding, symmetry, triangles, padding, or blocks. Finch reliably utilizes the key properties of structure, such as structural zeros, repeated values, or clustered non-zeros. We show that this leads to dramatic speedups in operations such as SpMV and SpGEMM, image processing, graph analytics, and a high-level tensor operator fusion interface.
翻译:从 FORTRAN 到 NumPy,数组彻底改变了我们表达计算的方式。然而,在几乎所有主流系统中,数组仅能处理稠密矩形整数网格。实际世界中的数组常包含内在结构,如稀疏性、重复值序列或对称性。目前对结构化数据的支持零散且不完整。现有框架为简化问题,限制了其所支持的数组结构与程序控制流。本研究提出一种新型编程语言 Finch,它同时支持灵活的控制流与多样化的数据结构。Finch 建立了一种编程模型,通过将控制流与数据结构合并为可协同优化的统一表示,解决了结构化数组计算中的挑战。该语言能自动将控制流与数据结构相适应,使性能工程师能专注于尝试多种算法。Finch 支持常见的循环、语句、条件判断、跳出等控制流语法,可处理多种数组结构,如稀疏、游程编码、对称、三角、填充或分块。它能可靠地利用结构的关键特性(如结构零值、重复值或非零元素聚集)。我们证明,该方法在诸多运算中可实现显著加速,包括 SpMV、SpGEMM、图像处理、图分析以及高阶张量算子融合接口。