Tensor processing infrastructures such as deep learning frameworks and specialized hardware accelerators have revolutionized how computationally intensive code from domains such as deep learning and image processing is executed and optimized. These infrastructures provide powerful and expressive abstractions while ensuring high performance. However, to utilize them, code must be written specifically using the APIs / ISAs of such software frameworks or hardware accelerators. Importantly, given the fast pace of innovation in these domains, code written today quickly becomes legacy as new frameworks and accelerators are developed, and migrating such legacy code manually is a considerable effort. To enable developers in leveraging such DSLs while preserving their current programming paradigm, we introduce Tenspiler, a verified lifting-based compiler that uses program synthesis to translate sequential programs written in general-purpose programming languages (e.g., C++ or Python code) into tensor operations. Central to Tenspiler is our carefully crafted yet simple intermediate language, named TensIR, that expresses tensor operations. TensIR enables efficient lifting, verification, and code generation. Currently, Tenspiler already supports \textbf{six} DSLs, spanning a broad spectrum of software and hardware environments. Furthermore, we show that new backends can be easily supported by Tenspiler by adding simple pattern-matching rules for TensIR. Using 10 real-world code benchmark suites, our experimental evaluation shows that by translating code to be executed on \textbf{6} different software frameworks and hardware devices, Tenspiler offers on average 105$\times$ kernel and 9.65$\times$ end-to-end execution time improvement over the fully-optimized sequential implementation of the same benchmarks.
翻译:深度学习框架和专用硬件加速器等张量处理基础设施已经彻底改变了深度学习与图像处理等领域中计算密集型代码的执行与优化方式。这些基础设施在确保高性能的同时,提供了强大且富有表现力的抽象。然而,要利用它们,代码必须专门使用此类软件框架或硬件加速器的API/ISA来编写。重要的是,鉴于这些领域的快速创新,当前编写的代码随着新框架和加速器的开发会迅速过时,而手动迁移此类遗留代码需要付出巨大努力。为了使开发者能够在保持现有编程范式的同时利用此类领域特定语言(DSL),我们提出了Tenspiler,一种基于提升的已验证编译器,它利用程序综合技术将通用编程语言(如C++或Python代码)编写的顺序程序转换为张量操作。Tenspiler的核心是我们精心设计但简洁的中间语言TensIR,用于表达张量操作。TensIR支持高效的提升、验证和代码生成。目前,Tenspiler已支持**六种**DSL,覆盖广泛的软硬件环境。此外,我们证明通过为TensIR添加简单的模式匹配规则,可以轻松支持新的后端。基于10个真实世界代码基准测试套件的实验评估表明,通过将代码转换为在**6**种不同软件框架和硬件设备上执行,Tenspiler相比相同基准测试的完全优化顺序实现,平均实现了105倍的核函数加速和9.65倍的端到端执行时间提升。