There has been a proliferation of artificial intelligence applications, where model training is key to promising high-quality services for these applications. However, the model training process is both time-intensive and energy-intensive, inevitably affecting the user's demand for application efficiency. Layer freezing, an efficient model training technique, has been proposed to improve training efficiency. Although existing layer freezing methods demonstrate the great potential to reduce model training costs, they still remain shortcomings such as lacking generalizability and compromised accuracy. For instance, existing layer freezing methods either require the freeze configurations to be manually defined before training, which does not apply to different networks, or use heuristic freezing criteria that is hard to guarantee decent accuracy in different scenarios. Therefore, there lacks a generic and smart layer freezing method that can automatically perform ``in-situation'' layer freezing for different networks during training processes. To this end, we propose a generic and efficient training framework (SmartFRZ). The core proposed technique in SmartFRZ is attention-guided layer freezing, which can automatically select the appropriate layers to freeze without compromising accuracy. Experimental results show that SmartFRZ effectively reduces the amount of computation in training and achieves significant training acceleration, and outperforms the state-of-the-art layer freezing approaches.
翻译:人工智能应用日益增多,模型训练是保障这些应用提供高质量服务的关键。然而,模型训练过程既耗时又耗能,不可避免地影响用户对应用效率的需求。层冻结作为一种高效的模型训练技术,已被提出用于提升训练效率。尽管现有层冻结方法在降低模型训练成本方面展现出巨大潜力,但依然存在泛化性不足和精度损失等问题。例如,现有方法要么需要在训练前手动定义冻结配置,无法适用于不同网络;要么采用启发式冻结准则,难以在不同场景下保证足够精度。因此,目前缺乏一种通用且智能的层冻结方法,能够在训练过程中自动为不同网络执行"情境化"层冻结。为此,我们提出了一个通用高效的训练框架(SmartFRZ)。SmartFRZ的核心创新在于注意力引导的层冻结技术,该技术能自动选择合适层进行冻结而不影响精度。实验结果表明,SmartFRZ有效减少了训练计算量,实现了显著的训练加速,性能优于当前最先进的层冻结方法。