Arc Gradient Descent: A Geometrically Motivated Gradient Descent-based Optimiser with Phase-Aware, User-Controlled Step Dynamics (proof-of-concept)

The paper presents the formulation, implementation, and evaluation of the ArcGD optimiser. The evaluation is conducted initially on a non-convex benchmark function and subsequently on a real-world ML dataset. The initial comparative study using the Adam optimiser is conducted on a stochastic variant of the highly non-convex and notoriously challenging Rosenbrock function, renowned for its narrow, curved valley, across dimensions ranging from 2D to 1000D and an extreme case of 50,000D. Two configurations were evaluated to eliminate learning-rate bias: (i) both using ArcGD's effective learning rate and (ii) both using Adam's default learning rate. ArcGD consistently outperformed Adam under the first setting and, although slower under the second, achieved superior final solutions in most cases. In the second evaluation, ArcGD is evaluated against state-of-the-art optimizers (Adam, AdamW, Lion, SGD) on the CIFAR-10 image classification dataset across 8 diverse MLP architectures ranging from 1 to 5 hidden layers. ArcGD achieved the highest average test accuracy (50.7%) at 20,000 iterations, outperforming AdamW (46.6%), Adam (46.8%), SGD (49.6%), and Lion (43.4%), winning or tying on 6 of 8 architectures. Notably, while Adam and AdamW showed strong early convergence at 5,000 iterations, but regressed with extended training, whereas ArcGD continued improving, demonstrating generalization and resistance to overfitting without requiring early stopping tuning. Strong performance on geometric stress tests and standard deep-learning benchmarks indicates broad applicability, highlighting the need for further exploration. Moreover, it is also shown that both a limiting variant of ArcGD and a momentum augmented ArcGD, recover sign-based momentum updates, revealing a clear conceptual link between ArcGD's phase structure and the core mechanism of the Lion Optimiser.

翻译：本文提出了ArcGD优化器的公式化描述、实现与评估。评估首先在非凸基准函数上进行，随后在真实机器学习数据集上展开。与Adam优化器的初步对比研究采用高度非凸且因狭窄弯曲谷地著称的Rosenbrock函数的随机变体，维度范围涵盖2D至1000D及极端情况50,000D。为消除学习率偏差，实验配置两种方案：(i)均使用ArcGD的有效学习率，(ii)均使用Adam的默认学习率。在第一种设置下ArcGD始终优于Adam，第二种设置下虽收敛较慢，但多数情况下获得更优最终解。第二项评估中，ArcGD与当前顶尖优化器（Adam、AdamW、Lion、SGD）在CIFAR-10图像分类数据集上展开对比，涵盖8种隐藏层数为1至5层的多样化MLP架构。ArcGD在20,000次迭代时取得最高平均测试准确率（50.7%），优于AdamW（46.6%）、Adam（46.8%）、SGD（49.6%）和Lion（43.4%），在8种架构中6种取得领先或持平。值得注意的是，Adam与AdamW在5,000次迭代时展现强早期收敛性，但随训练延长性能衰退；而ArcGD持续提升，展现出泛化能力与抗过拟合特性，无需调整早停策略。在几何应力测试与标准深度学习基准上的强劲表现表明其广泛适用性，凸显进一步探索的必要性。此外，研究同时表明ArcGD的极限变体与动量增强变体均可恢复符号动量更新机制，揭示了ArcGD相位结构与Lion优化器核心机制间的明确概念关联。