We propose the simplest SGD enhanced method ever, Loss-Controlled Asymmetric Momentum(LCAM), aimed directly at the Saddle Point problem. Compared to the traditional SGD with Momentum, there's no increase in computational demand, yet it outperforms all current optimizers. We use the concepts of weight conjugation and traction effect to explain this phenomenon. We designed experiments to rapidly reduce the learning rate at specified epochs to trap parameters more easily at saddle points. We selected WRN28-10 as the test network and chose cifar10 and cifar100 as test datasets, an identical group to the original paper of WRN and Cosine Annealing Scheduling(CAS). We compared the ability to bypass saddle points of Asymmetric Momentum with different priorities. Finally, using WRN28-10 on Cifar100, we achieved a peak average test accuracy of 80.78\% around 120 epoch. For comparison, the original WRN paper reported 80.75\%, while CAS was at 80.42\%, all at 200 epoch. This means that while potentially increasing accuracy, we use nearly half convergence time. Our demonstration code is available at\\ https://github.com/hakumaicc/Asymmetric-Momentum-LCAM
翻译:我们提出了迄今为止最简单的SGD增强方法——损失控制不对称动量(LCAM),该方法直接针对鞍点问题。与传统的带动量SGD相比,其计算需求并未增加,但性能却优于所有现有优化器。我们利用权重共轭与牵引效应的概念解释这一现象。我们设计实验在指定周期快速降低学习率,使参数更容易陷入鞍点。我们选择WRN28-10作为测试网络,并选用cifar10和cifar100作为测试数据集——该组合与WRN原论文及余弦退火调度(CAS)论文保持一致。我们比较了不同优先级的不对称动量绕过鞍点的能力。最终,在Cifar100上使用WRN28-10时,我们在约120个周期达到了80.78%的平均测试准确率峰值。作为对比,WRN原论文报告的结果为80.75%,CAS为80.42%,且均在200个周期达成。这意味着在可能提升精度的同时,我们使用了近一半的收敛时间。我们的演示代码见:https://github.com/hakumaicc/Asymmetric-Momentum-LCAM