We introduce a new method inspired by Adam that enhances convergence speed and achieves better loss function minima. Traditional optimizers, including Adam, apply uniform or globally adjusted learning rates across neural networks without considering their architectural specifics. This architecture-agnostic approach is deeply embedded in most deep learning frameworks, where optimizers are implemented as standalone modules without direct access to the network's structural information. For instance, in popular frameworks like Keras or PyTorch, optimizers operate solely on gradients and parameters, without knowledge of layer connectivity or network topology. Our algorithm, CaAdam, explores this overlooked area by introducing connection-aware optimization through carefully designed proxies of architectural information. We propose multiple scaling methodologies that dynamically adjust learning rates based on easily accessible structural properties such as layer depth, connection counts, and gradient distributions. This approach enables more granular optimization while working within the constraints of current deep learning frameworks. Empirical evaluations on standard datasets (e.g., CIFAR-10, Fashion MNIST) show that our method consistently achieves faster convergence and higher accuracy compared to standard Adam optimizer, demonstrating the potential benefits of incorporating architectural awareness in optimization strategies.
翻译:我们提出了一种受Adam启发的新方法,该方法能提升收敛速度并达到更优的损失函数极小值。包括Adam在内的传统优化器在神经网络中采用统一或全局调整的学习率,而未考虑网络架构的具体特性。这种与架构无关的方法深植于大多数深度学习框架中——优化器作为独立模块实现,无法直接获取网络结构信息。例如在Keras或PyTorch等主流框架中,优化器仅基于梯度和参数运行,无法感知层的连接关系或网络拓扑结构。我们的算法CaAdam通过精心设计的架构信息代理引入连接感知优化机制,探索了这一被忽视的领域。我们提出了多种基于可访问结构特性(如层深度、连接数量、梯度分布)动态调整学习率的缩放方法。该方案能在现有深度学习框架的限制下实现更精细的优化。在标准数据集(如CIFAR-10、Fashion MNIST)上的实证评估表明,相较于标准Adam优化器,我们的方法能持续获得更快的收敛速度和更高的准确率,这证明了在优化策略中引入架构感知机制的潜在优势。