In this study, we introduce a novel adaptive optimizer, H-Fac, which incorporates a factorized approach to momentum and scaling parameters. Our algorithm demonstrates competitive performances on both ResNets and Vision Transformers, while achieving sublinear memory costs through the use of rank-1 parameterizations for moment estimators. We develop our algorithms based on principles derived from Hamiltonian dynamics, providing robust theoretical underpinnings. These optimization algorithms are designed to be both straightforward and adaptable, facilitating easy implementation in diverse settings.
翻译:在本研究中,我们提出了一种新颖的自适应优化器H-Fac,它采用了一种因子化的方法来处理动量和缩放参数。我们的算法在ResNet和Vision Transformer上均展现出具有竞争力的性能,同时通过使用秩-1参数化来估计矩,实现了次线性的内存开销。我们基于哈密顿动力学原理推导出算法,为其提供了坚实的理论基础。这些优化算法设计得既简洁又灵活,便于在不同场景中轻松实现。