CAdam: Confidence-Based Optimization for Online Learning

Modern recommendation systems frequently employ online learning to dynamically update their models with freshly collected data. The most commonly used optimizer for updating neural networks in these contexts is the Adam optimizer, which integrates momentum ($m_t$) and adaptive learning rate ($v_t$). However, the volatile nature of online learning data, characterized by its frequent distribution shifts and presence of noises, poses significant challenges to Adam's standard optimization process: (1) Adam may use outdated momentum and the average of squared gradients, resulting in slower adaptation to distribution changes, and (2) Adam's performance is adversely affected by data noise. To mitigate these issues, we introduce CAdam, a confidence-based optimization strategy that assesses the consistence between the momentum and the gradient for each parameter dimension before deciding on updates. If momentum and gradient are in sync, CAdam proceeds with parameter updates according to Adam's original formulation; if not, it temporarily withholds updates and monitors potential shifts in data distribution in subsequent iterations. This method allows CAdam to distinguish between the true distributional shifts and mere noise, and adapt more quickly to new data distributions. Our experiments with both synthetic and real-world datasets demonstrate that CAdam surpasses other well-known optimizers, including the original Adam, in efficiency and noise robustness. Furthermore, in large-scale A/B testing within a live recommendation system, CAdam significantly enhances model performance compared to Adam, leading to substantial increases in the system's gross merchandise volume (GMV).

翻译：现代推荐系统常采用在线学习技术，利用新收集的数据动态更新模型。在这些场景中，更新神经网络最常用的优化器是Adam优化器，它融合了动量（$m_t$）与自适应学习率（$v_t$）。然而，在线学习数据具有波动性强的特点，表现为频繁的分布漂移和噪声存在，这给Adam的标准优化过程带来了显著挑战：（1）Adam可能使用过时的动量与梯度平方平均值，导致对分布变化的适应速度较慢；（2）Adam的性能易受数据噪声的不利影响。为缓解这些问题，本文提出CAdam——一种基于置信度的优化策略，该策略在决定参数更新前，会评估每个参数维度上动量与梯度之间的一致性。若动量与梯度方向一致，CAdam即按照Adam原始公式进行参数更新；若不一致，则暂时保留更新并在后续迭代中监测数据分布的潜在变化。该方法使CAdam能够区分真实的分布漂移与单纯噪声，从而更快地适应新的数据分布。我们在合成数据集和真实数据集上的实验表明，CAdam在效率和噪声鲁棒性方面均优于包括原始Adam在内的其他知名优化器。此外，在在线推荐系统的大规模A/B测试中，与Adam相比，CAdam显著提升了模型性能，并带来系统总商品交易额（GMV）的大幅增长。