This paper introduces the continuous tensor abstraction, allowing indices to take real-number values (for example, A[3.14]). It also presents continuous tensor algebra expressions, such as C(x,y) = A(x,y) * B(x,y), where indices are defined over a continuous domain. This work expands the traditional tensor model to include continuous tensors. Our implementation supports piecewise-constant tensors, enabling infinite domains to be processed in finite time. We also introduce a new tensor format for efficient storage and a code generation technique for automatic kernel generation. For the first time, our abstraction expresses domains like computational geometry and computer graphics in the language of tensor programming. Our approach demonstrates competitive or better performance than hand-optimized kernels in leading libraries across diverse applications. Compared to hand-implemented libraries on a CPU, our compiler-based implementation achieves an average speedup of 9.20x on 2D radius search with approximately 60x fewer lines of code (LoC), 1.22x on genomic interval overlapping queries (with approximately 18x LoC saving), and 1.69x on trilinear interpolation in Neural Radiance Field (with approximately 6x LoC saving).
翻译:本文提出了连续张量抽象,允许索引取实数值(例如A[3.14]),并引入了连续张量代数表达式,如C(x,y) = A(x,y) * B(x,y),其中索引定义在连续域上。该工作将传统张量模型扩展至包含连续张量。我们的实现支持分段常数张量,使得无限域能在有限时间内被处理。我们还提出了一种用于高效存储的新张量格式以及一种用于自动内核生成的代码生成技术。该抽象首次使得计算几何和计算机图形学等领域能够用张量编程语言进行表达。我们的方法在多样化应用中展现出与主流库中手动优化内核相当或更优的性能。与CPU上手写实现的库相比,我们基于编译器的实现在二维半径搜索任务上平均加速9.20倍且代码行数减少约60倍,在基因组区间重叠查询任务上加速1.22倍且代码行数减少约18倍,在神经辐射场三线性插值任务上加速1.69倍且代码行数减少约6倍。