Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction

from arxiv, 24 pages. Published as a conference paper at ECML PKDD 2021. This version includes Appendix which was not included in the published version because of page limit

We develop a novel framework that adds the regularizers of the sparse group lasso to a family of adaptive optimizers in deep learning, such as Momentum, Adagrad, Adam, AMSGrad, AdaHessian, and create a new class of optimizers, which are named Group Momentum, Group Adagrad, Group Adam, Group AMSGrad and Group AdaHessian, etc., accordingly. We establish theoretically proven convergence guarantees in the stochastic convex settings, based on primal-dual methods. We evaluate the regularized effect of our new optimizers on three large-scale real-world ad click datasets with state-of-the-art deep learning models. The experimental results reveal that compared with the original optimizers with the post-processing procedure which uses the magnitude pruning method, the performance of the models can be significantly improved on the same sparsity level. Furthermore, in comparison to the cases without magnitude pruning, our methods can achieve extremely high sparsity with significantly better or highly competitive performance. The code is available at https://github.com/intelligent-machine-learning/dlrover/blob/master/tfplus.

翻译：我们提出了一种新颖的框架，将稀疏组Lasso的正则化项引入深度学习中的一类自适应优化器（包括Momentum、Adagrad、Adam、AMSGrad、AdaHessian等），并据此创建了新的优化器类别，分别命名为Group Momentum、Group Adagrad、Group Adam、Group AMSGrad和Group AdaHessian。基于原对偶方法，我们在随机凸优化场景下建立了理论上可证明的收敛性保证。我们使用最先进的深度学习模型，在三个大规模真实广告点击数据集上评估了新优化器的正则化效果。实验结果表明，与采用幅度剪枝后处理过程的原始优化器相比，在相同稀疏度水平下，模型性能得到显著提升。此外，与未进行幅度剪枝的情况相比，我们的方法能够在实现极高稀疏度的同时，保持显著更优或极具竞争力的性能。相关代码已开源在https://github.com/intelligent-machine-learning/dlrover/blob/master/tfplus。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日