Tuning tensor program generation involves searching for various possible program transformation combinations for a given program on target hardware to optimize the tensor program execution. It is already a complex process because of the massive search space and exponential combinations of transformations make auto-tuning tensor program generation more challenging, especially when we have a heterogeneous target. In this research, we attempt to address these problems by learning the joint neural network and hardware features and transferring them to the new target hardware. We extensively study the existing state-of-the-art dataset, TenSet, perform comparative analysis on the test split strategies and propose methodologies to prune the dataset. We adopt an attention-inspired approach for tuning the tensor programs enabling them to embed neural network and hardware-specific features. Our approach could prune the dataset up to 45\% of the baseline without compromising the Pairwise Comparison Accuracy (PCA). Further, the proposed methodology can achieve on-par or improved mean inference time with 25%-40% of the baseline tuning time across different networks and target hardware.
翻译:张量程序生成的调优涉及为目标硬件上的给定程序搜索各种可能的程序变换组合,以优化张量程序的执行效率。由于搜索空间庞大且变换组合呈指数级增长,自动调优张量程序生成已成为一个复杂过程,尤其在面对异构目标硬件时更具挑战性。本研究尝试通过学习联合神经网络与硬件特征,并将其迁移至新目标硬件来解决上述问题。我们深入研究了现有最先进数据集TenSet,对测试集划分策略进行了比较分析,并提出了数据集剪枝方法。我们采用受注意力机制启发的方法来调优张量程序,使其能够嵌入神经网络及硬件特定特征。该方法可在不牺牲成对比较准确率(PCA)的前提下,将数据集剪枝至基线的45%。此外,所提方法在不同网络和目标硬件上,能实现与基线相当或更优的平均推理时间,且调优时间仅为基线的25%-40%。