Performance optimization is an increasingly challenging but often repetitive task. While each platform has its quirks, the underlying code transformations rely on data movement and computational characteristics that recur across applications. This paper proposes to leverage those similarities by constructing an embedding space for subprograms. The continuous space captures both static and dynamic properties of loop nests via symbolic code analysis and performance profiling, respectively. Performance embeddings enable direct knowledge transfer of performance tuning between applications, which can result from autotuning or tailored improvements. We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils. Transfer tuning reduces the search complexity by up to four orders of magnitude and outperforms the MKL library in sparse-dense matrix multiplication. The results exhibit clear correspondences between program characteristics and optimizations, outperforming prior specialized state-of-the-art approaches and generalizing beyond their capabilities.
翻译:性能优化是一项日益具有挑战性但又常常重复的任务。尽管每个平台都有其独特性,但底层的代码转换依赖于数据移动和计算特性,这些特性在不同应用中反复出现。本文提出通过构建子程序的嵌入空间来利用这些相似性。该连续空间通过符号代码分析和性能剖析分别捕获循环嵌套的静态和动态属性。性能嵌入能够实现应用间性能调优的直接知识迁移,这种迁移可源自自动调优或定制化改进。我们在深度神经网络、稠密和稀疏线性代数组合以及数值天气预报模板的案例研究中展示了这种迁移调优方法。迁移调优将搜索复杂度降低了多达四个数量级,并在稀疏-稠密矩阵乘法中优于MKL库。结果表明程序特征与优化之间存在清晰的对应关系,性能超越先前专门的最先进方法,并具备超越其能力的泛化性。