Even nowadays, where Deep Learning (DL) has achieved state-of-the-art performance in a wide range of research domains, accelerating training and building robust DL models remains a challenging task. To this end, generations of researchers have pursued to develop robust methods for training DL architectures that can be less sensitive to weight distributions, model architectures and loss landscapes. However, such methods are limited to adaptive learning rate optimizers, initialization schemes, and clipping gradients without investigating the fundamental rule of parameters update. Although multiplicative updates have contributed significantly to the early development of machine learning and hold strong theoretical claims, to best of our knowledge, this is the first work that investigate them in context of DL training acceleration and robustness. In this work, we propose an optimization framework that fits to a wide range of optimization algorithms and enables one to apply alternative update rules. To this end, we propose a novel multiplicative update rule and we extend their capabilities by combining it with a traditional additive update term, under a novel hybrid update method. We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule and we experimentally demonstrate their effectiveness in a wide range of task and optimization methods. Such tasks ranging from convex and non-convex optimization to difficult image classification benchmarks applying a wide range of traditionally used optimization methods and Deep Neural Network (DNN) architectures.
翻译:尽管深度学习已在众多研究领域取得最先进性能,但加速训练和构建鲁棒性深度学习模型至今仍是具有挑战性的任务。为此,历代研究者致力于开发对权重分布、模型架构和损失曲面具有低敏感性鲁棒训练方法。然而,这些方法仅局限于自适应学习率优化器、初始化策略和梯度裁剪,并未探究参数更新的基本规则。尽管乘法更新对机器学习早期发展做出了重要贡献并具有坚实的理论支撑,但据我们所知,这是首项将其应用于深度学习训练加速与鲁棒性提升的研究。本文提出一种适用于多种优化算法的优化框架,能够实现替代性更新规则的应用。为此,我们创新性地提出乘法更新规则,并通过将其与传统加法更新项相结合的新型混合更新方法扩展其能力。我们声称该框架在加速训练的同时,能比传统加法更新规则生成更具鲁棒性的模型,并通过涵盖凸优化、非凸优化以及多种传统优化方法与深度神经网络架构的复杂图像分类基准等任务实验验证了其有效性。