This paper introduces the continuous tensor abstraction, allowing indices to take real-number values (for example, A[3.14]). It also presents continuous tensor algebra expressions, such as C(x,y) = A(x,y) * B(x,y), where indices are defined over a continuous domain. This work expands the traditional tensor model to include continuous tensors. Our implementation supports piecewise-constant tensors, enabling infinite domains to be processed in finite time. We also introduce a new tensor format for efficient storage and a code generation technique for automatic kernel generation. For the first time, our abstraction expresses domains like computational geometry and computer graphics in the language of tensor programming. Our approach demonstrates competitive or better performance than hand-optimized kernels in leading libraries across diverse applications. Compared to hand-implemented libraries on a CPU, our compiler-based implementation achieves an average speedup of 9.20x on 2D radius search with approximately 60x fewer lines of code (LoC), 1.22x on genomic interval overlapping queries (with approximately 18x LoC saving), and 1.69x on trilinear interpolation in Neural Radiance Field (with approximately 6x LoC saving).
翻译:本文提出了连续张量抽象,允许索引取实数值(例如A[3.14])。同时引入了连续张量代数表达式,如C(x,y) = A(x,y) * B(x,y),其中索引定义在连续域上。这项工作扩展了传统的张量模型,使其包含连续张量。我们的实现支持分段常数张量,使得无限域能够在有限时间内被处理。我们还提出了一种用于高效存储的新张量格式以及一种用于自动内核生成的代码生成技术。我们的抽象首次以张量编程语言表达了计算几何和计算机图形学等领域。在多样化应用中,我们的方法相较于主流库中手动优化的内核展现出具有竞争力或更优的性能。与CPU上手写实现的库相比,我们基于编译器的实现在二维半径搜索上平均加速9.20倍(代码行数减少约60倍),在基因组区间重叠查询上加速1.22倍(代码行数节省约18倍),在神经辐射场的三线性插值上加速1.69倍(代码行数节省约6倍)。