Variational inference (VI) provides a principled framework for estimating posterior distributions over model parameters, enabling explicit modeling of weight uncertainty during optimization. By capturing this uncertainty, VI improves the reliability of predictions, yielding better calibrated outputs. In this work, we investigate the benefits of VI for challenging multimodal understanding and reasoning by applying the Improved Variational Online Newton (IVON), a recent VI optimizer, to fine-tuning a multimodal large language model on audio question answering tasks. Our results show that VI not only enhances predictive accuracy but also significantly improves calibration, reducing the model's overconfidence. These advances further support risk-sensitive applications such as selective prediction, where reliable confidence estimates are crucial.
翻译:变分推断(VI)为模型参数的后验分布估计提供了原则性框架,能够在优化过程中显式建模权重不确定性。通过捕捉这种不确定性,VI提升了预测的可靠性,并产生校准更优的输出。本研究将近期提出的变分优化器——改进型变分在线牛顿法(IVON)应用于音频问答任务中的多模态大语言模型微调,以此探究VI在复杂多模态理解与推理任务中的优势。实验结果表明,VI不仅提升了预测准确率,还显著改善了模型校准效果,降低了模型的过度自信倾向。这些进展进一步支持了风险敏感型应用(如选择性预测),此类应用对可靠的置信度估计具有关键需求。