FastFace: Fast-converging Scheduler for Large-scale Face Recognition Training with One GPU

Computing power has evolved into a foundational and indispensable resource in the area of deep learning, particularly in tasks such as Face Recognition (FR) model training on large-scale datasets, where multiple GPUs are often a necessity. Recognizing this challenge, some FR methods have started exploring ways to compress the fully-connected layer in FR models. Unlike other approaches, our observations reveal that without prompt scheduling of the learning rate (LR) during FR model training, the loss curve tends to exhibit numerous stationary subsequences. To address this issue, we introduce a novel LR scheduler leveraging Exponential Moving Average (EMA) and Haar Convolutional Kernel (HCK) to eliminate stationary subsequences, resulting in a significant reduction in converging time. However, the proposed scheduler incurs a considerable computational overhead due to its time complexity. To overcome this limitation, we propose FastFace, a fast-converging scheduler with negligible time complexity, i.e. O(1) per iteration, during training. In practice, FastFace is able to accelerate FR model training to a quarter of its original time without sacrificing more than 1% accuracy, making large-scale FR training feasible even with just one single GPU in terms of both time and space complexity. Extensive experiments validate the efficiency and effectiveness of FastFace. The code is publicly available at: https://github.com/amoonfana/FastFace

翻译：计算能力已成为深度学习领域的基础性且不可或缺的资源，特别是在大规模数据集上进行人脸识别模型训练等任务中，多GPU往往是必要条件。针对这一挑战，一些人脸识别方法已开始探索压缩模型中全连接层的途径。与其他方法不同，我们的观察表明，若在人脸识别模型训练期间未能及时调整学习率，损失曲线往往呈现大量平稳子序列。为解决此问题，我们提出一种新颖的学习率调度器，利用指数移动平均与哈尔卷积核消除平稳子序列，从而显著缩短收敛时间。然而，该调度器因其时间复杂度而带来可观的计算开销。为克服此局限，我们提出FastFace——一种具有可忽略时间复杂度（即每次迭代O(1)）的高效收敛调度器。实际应用中，FastFace能在精度损失不超过1%的前提下，将人脸识别模型训练速度提升至原始时间的四分之一，使得即使仅使用单个GPU，在大规模人脸识别训练的时间与空间复杂度层面均具备可行性。大量实验验证了FastFace的效率与有效性。代码已公开于：https://github.com/amoonfana/FastFace

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日