As diverse high-performance computing (HPC) systems are built, many opportunities arise for applications to solve larger problems than ever before. Given the significantly increased complexity of these HPC systems and application tuning, empirical performance tuning, such as autotuning, has emerged as a promising approach in recent years. Despite its effectiveness, autotuning is often a computationally expensive approach. Transfer learning (TL)-based autotuning seeks to address this issue by leveraging the data from prior tuning. Current TL methods for autotuning spend significant time modeling the relationship between parameter configurations and performance, which is ineffective for few-shot (that is, few empirical evaluations) tuning on new tasks. We introduce the first generative TL-based autotuning approach based on the Gaussian copula (GC) to model the high-performing regions of the search space from prior data and then generate high-performing configurations for new tasks. This allows a sampling-based approach that maximizes few-shot performance and provides the first probabilistic estimation of the few-shot budget for effective TL-based autotuning. We compare our generative TL approach with state-of-the-art autotuning techniques on several benchmarks. We find that the GC is capable of achieving 64.37% of peak few-shot performance in its first evaluation. Furthermore, the GC model can determine a few-shot transfer budget that yields up to 33.39$\times$ speedup, a dramatic improvement over the 20.58$\times$ speedup using prior techniques.
翻译:随着多样化高性能计算系统的构建,应用在解决前所未有的大规模问题时获得更多机遇。鉴于这些HPC系统与应用调优复杂度的显著增加,经验性性能调优(如自动调优)近年来已成为极具前景的研究方向。尽管自动调优成效显著,但其计算成本往往较高。基于迁移学习的自动调优通过利用先验调优数据试图解决该问题。现有自动调优迁移学习方法在建模参数配置与性能关系时耗时过长,难以有效实现新任务上的小样本调优。本文首次提出基于高斯Copula的生成式迁移学习自动调优方法,通过从先验数据中建模搜索空间的高性能区域,为新任务生成高性能配置。该方法采用基于采样的策略最大化小样本性能,并首次为有效的小样本迁移调优提供概率化预算估计。我们在多个基准测试中将所提生成式迁移学习方法与先进自动调优技术进行对比。实验发现,高斯Copula在首次评估时即可达到峰值小样本性能的64.37%。此外,该模型能够确定小样本迁移预算,实现最高33.39倍的加速比,较先前技术20.58倍加速比提升显著。