Dedicated tensor accelerators demonstrate the importance of linear algebra in modern applications. Such accelerators have the potential for impressive performance gains, but require programmers to rewrite code using vendor APIs - a barrier to wider scale adoption. Recent work overcomes this by matching and replacing patterns within code, but such approaches are fragile and fail to cope with the diversity of real-world codes. We develop ATC, a compiler that uses program synthesis to map regions of code to specific APIs. The mapping space that ATC explores is combinatorially large, requiring the development of program classification, dynamic analysis, variable constraint generation and lexical distance matching techniques to make it tractable. We apply ATC to real-world tensor and linear algebra codes and evaluate them against four state-of-the-art approaches. We accelerate between 2.6x and 7x more programs, leading to over an order of magnitude performance improvement.
翻译:专用张量加速器突显了线性代数在现代应用中的重要性。此类加速器具有实现显著性能提升的潜力,但要求程序员使用供应商API重写代码——这成为大规模推广的障碍。近期研究通过匹配和替换代码中的模式来克服这一障碍,但这些方法较为脆弱,难以应对真实世界代码的多样性。我们开发了ATC编译器,它利用程序综合技术将代码片段映射到特定API。ATC探索的映射空间呈组合爆炸式增长,为此我们开发了程序分类、动态分析、变量约束生成以及词法距离匹配等技术,使其变得可处理。我们将ATC应用于真实的张量和线性代数代码,并与四种最先进的方法进行对比评估。ATC能够加速多出2.6倍到7倍的程序数量,从而带来超过一个数量级的性能提升。