We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON's computational costs are nearly identical to Adam but its predictive uncertainty is better. We show several new use cases of IVON where we improve fine-tuning and model merging in Large Language Models, accurately predict generalization error, and faithfully estimate sensitivity to data. We find overwhelming evidence in support of effectiveness of variational learning.
翻译:我们提供了大量实证证据,反驳了变分学习对大型神经网络无效的普遍观点。我们展示了一种名为改进变分在线牛顿(IVON)的优化器,在从头训练GPT-2和ResNets等大型网络时,始终能与Adam匹配或超越其性能。IVON的计算成本与Adam几乎相同,但其预测不确定性更优。我们展示了IVON的几个新应用场景:改进大型语言模型的微调和模型合并、准确预测泛化误差、以及忠实估计对数据的敏感性。我们找到了压倒性证据支持变分学习的有效性。