Tenspiler: A Verified Lifting-Based Compiler for Tensor Operations

Tensor processing infrastructures such as deep learning frameworks and specialized hardware accelerators have revolutionized how computationally intensive code from domains such as deep learning and image processing is executed and optimized. These infrastructures provide powerful and expressive abstractions while ensuring high performance. However, to utilize them, code must be written specifically using the APIs / ISAs of such software frameworks or hardware accelerators. Importantly, given the fast pace of innovation in these domains, code written today quickly becomes legacy as new frameworks and accelerators are developed, and migrating such legacy code manually is a considerable effort. To enable developers in leveraging such DSLs while preserving their current programming paradigm, we introduce Tenspiler, a verified lifting-based compiler that uses program synthesis to translate sequential programs written in general-purpose programming languages (e.g., C++ or Python code) into tensor operations. Central to Tenspiler is our carefully crafted yet simple intermediate language, named TensIR, that expresses tensor operations. TensIR enables efficient lifting, verification, and code generation. Currently, Tenspiler already supports \textbf{six} DSLs, spanning a broad spectrum of software and hardware environments. Furthermore, we show that new backends can be easily supported by Tenspiler by adding simple pattern-matching rules for TensIR. Using 10 real-world code benchmark suites, our experimental evaluation shows that by translating code to be executed on \textbf{6} different software frameworks and hardware devices, Tenspiler offers on average 105$\times$ kernel and 9.65$\times$ end-to-end execution time improvement over the fully-optimized sequential implementation of the same benchmarks.

翻译：张量处理基础设施（如深度学习框架和专用硬件加速器）彻底改变了深度学习与图像处理等高计算密集型代码的执行与优化方式。这些基础设施在提供强大且富有表现力的抽象能力的同时，确保了卓越的性能表现。然而，要利用这些设施，必须使用相关软件框架或硬件加速器的API/ISA编写代码。值得注意的是，鉴于这些领域快速创新的特性，随着新框架与加速器的涌现，当前编写的代码很快会沦为遗留代码，而手动迁移此类遗留代码需要投入大量精力。为使开发者能够在保留现有编程范式的同时利用这些领域特定语言，我们提出Tenspiler——一种基于提升的可验证编译器，通过程序合成技术将通用编程语言（如C++或Python代码）编写的顺序程序转换为张量操作。Tenspiler的核心在于我们精心设计且简洁的中间语言TensIR，该语言能够表达张量操作，并支持高效的提升、验证与代码生成。目前，Tenspiler已支持\textbf{六种}覆盖广泛软件与硬件环境的领域特定语言。此外，我们证明通过为TensIR添加简单的模式匹配规则，即可轻松支持新的后端。基于10个真实代码基准测试套件的实验评估表明，通过将代码转换至\textbf{6种}不同软件框架与硬件设备上执行，Tenspiler相较于同一基准测试的完全优化顺序实现，平均实现105$\times$的内核性能提升与9.65$\times$的端到端执行时间改进。