Transfer learning (TL) is an increasingly popular approach to training deep learning (DL) models that leverages the knowledge gained by training a foundation model on diverse, large-scale datasets for use on downstream tasks where less domain- or task-specific data is available. The literature is rich with TL techniques and applications; however, the bulk of the research makes use of deterministic DL models which are often uncalibrated and lack the ability to communicate a measure of epistemic (model) uncertainty in prediction. Unlike their deterministic counterparts, Bayesian DL (BDL) models are often well-calibrated, provide access to epistemic uncertainty for a prediction, and are capable of achieving competitive predictive performance. In this study, we propose variational inference pre-trained audio neural networks (VI-PANNs). VI-PANNs are a variational inference variant of the popular ResNet-54 architecture which are pre-trained on AudioSet, a large-scale audio event detection dataset. We evaluate the quality of the resulting uncertainty when transferring knowledge from VI-PANNs to other downstream acoustic classification tasks using the ESC-50, UrbanSound8K, and DCASE2013 datasets. We demonstrate, for the first time, that it is possible to transfer calibrated uncertainty information along with knowledge from upstream tasks to enhance a model's capability to perform downstream tasks.
翻译:迁移学习是一种日益流行的深度学习模型训练方法,通过利用在多样化大规模数据集上训练基础模型所获得的知识,用于下游领域中领域或任务特定数据较少的任务。现有文献中已涌现大量迁移学习技术与应用;然而,大部分研究采用确定性深度学习模型,这类模型通常校准不足,且缺乏在预测中传达认知(模型)不确定性度量能力。与确定性对应模型不同,贝叶斯深度学习模型通常校准良好,能够提供预测的认知不确定性,并具备竞争性预测性能。本研究中,我们提出变分推断预训练音频神经网络(VI-PANNs)。VI-PANNs是流行ResNet-54架构的变分推断变体,在大规模音频事件检测数据集AudioSet上预训练。我们利用ESC-50、UrbanSound8K和DCASE2013数据集,评估从VI-PANNs向其他下游声学分类任务迁移知识时所产生不确定性的质量。首次证明,将标定后的不确定性信息与上游任务知识协同迁移,可增强模型执行下游任务的能力。