As deep learning models become increasingly bigger and more complex, it is critical to improve model training and inference efficiency. Though a variety of highly optimized libraries and packages (known as DL kernels) have been developed, it is tedious and time-consuming to figure out which kernel to use, where to use, and how to use them correctly. To address this challenge, we propose an Automated Deep learning OPTimization approach called Adopter. We design a Domain-Specific Language (DSL) to represent DL model architectures and leverage this DSL to specify model transformation rules required to integrate a DL kernel into a model. Given the source code of a DL model and the transformation rules for a set of kernels, Adopter first performs inter-procedural analysis to identify and express the model architecture in our DSL. Then, Adopter performs scope analysis and sub-sequence matching to identify locations in the model architecture where the transformation rules can be applied. Finally, Adopter proposes a synthesis-based code transformation method to apply the transformation rule. We curated a benchmark with 199 models from Hugging Face and a diverse set of DL kernels. We found that, compared to a state-of-the-art automated code transformation technique, Adopter helps improve the precision and recall by 3% and 56%, respectively. An in-depth analysis of 9 models revealed that on average, Adopter improved the training speed by 22.7% while decreasing the GPU memory usage by 10.5%.
翻译:随着深度学习模型日益庞大复杂,提升模型训练与推理效率变得至关重要。尽管已有多种高度优化的库和包(即深度学习内核)被开发出来,但确定使用何种内核、在何处使用以及如何正确使用仍是一项繁琐耗时的任务。为应对这一挑战,我们提出了一种名为Adopter的自动化深度学习优化方法。我们设计了一种领域特定语言(DSL)用于表示深度学习模型架构,并利用该DSL定义将深度学习内核集成到模型中所需的模型变换规则。给定深度学习模型的源代码及一组内核的变换规则后,Adopter首先执行过程间分析,以识别模型架构并用DSL进行表示。随后,Adopter通过作用域分析与子序列匹配,定位模型架构中可应用变换规则的位置。最后,Adopter提出一种基于综合的代码变换方法来执行变换规则。我们构建了一个包含来自Hugging Face的199个模型及多种深度学习内核的基准测试集。实验表明,与现有最优的自动化代码变换技术相比,Adopter的精确率与召回率分别提升了3%和56%。对9个模型的深入分析显示,Adopter平均将训练速度提升了22.7%,同时降低了10.5%的GPU内存使用量。