We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON's computational costs are nearly identical to Adam but its predictive uncertainty is better. We show several new use cases of IVON where we improve finetuning and model merging in Large Language Models, accurately predict generalization error, and faithfully estimate sensitivity to data. We find overwhelming evidence that variational learning is effective.
翻译:我们提供了大量实证证据,反驳了关于变分学习对大型神经网络无效的普遍观点。我们证明,一种名为改进变分在线牛顿法(IVON)的优化器在从头开始训练GPT-2和ResNet等大型网络时,其表现始终与Adam优化器相当或更优。IVON的计算成本与Adam几乎相同,但其预测不确定性更优。我们展示了IVON的若干新应用场景:改进大型语言模型的微调与模型融合、准确预测泛化误差,以及可靠估计对数据的敏感性。我们发现了强有力的证据表明变分学习是有效的。