MLP is a heavily used backbone in modern deep learning (DL) architectures for supervised learning on tabular data, and AdamW is the go-to optimizer used to train tabular DL models. Unlike architecture design, however, the choice of optimizer for tabular DL has not been examined systematically, despite new optimizers showing promise in other domains. To fill this gap, we benchmark 15 optimizers on 17 tabular datasets for training MLP-based models in the standard supervised learning setting under a shared experiment protocol. Our main finding is that the Muon optimizer consistently outperforms AdamW, and thus should be considered a strong and practical choice for practitioners and researchers, if the associated training efficiency overhead is affordable. Additionally, we find exponential moving average of model weights to be a simple yet effective technique that improves AdamW on vanilla MLPs, though its effect is less consistent across model variants.
翻译:MLP是现代深度学习(DL)架构中用于表格数据监督学习的重要骨干网络,而AdamW是训练表格DL模型时首选的优化器。然而,与架构设计不同,尽管新型优化器在其他领域展现出潜力,针对表格DL中优化器的选择尚未经过系统检验。为填补这一空白,我们在统一实验协议下,基于17个表格数据集对15种优化器在标准监督学习场景中训练MLP模型进行了基准测试。主要发现是:Muon优化器持续优于AdamW,因此在可承受相关训练效率开销的前提下,应视为从业者和研究人员的强效实用选择。此外,我们发现模型权重的指数移动平均是一种简单却有效的技术,能提升普通MLP上AdamW的表现,但其效果在不同模型变体间存在不一致性。