Transfer learning (TL) is an increasingly popular approach to training deep learning (DL) models that leverages the knowledge gained by training a foundation model on diverse, large-scale datasets for use on downstream tasks where less domain- or task-specific data is available. The literature is rich with TL techniques and applications; however, the bulk of the research makes use of deterministic DL models which are often uncalibrated and lack the ability to communicate a measure of epistemic (model) uncertainty in prediction. Unlike their deterministic counterparts, Bayesian DL (BDL) models are often well-calibrated, provide access to epistemic uncertainty for a prediction, and are capable of achieving competitive predictive performance. In this study, we propose variational inference pre-trained audio neural networks (VI-PANNs). VI-PANNs are a variational inference variant of the popular ResNet-54 architecture which are pre-trained on AudioSet, a large-scale audio event detection dataset. We evaluate the quality of the resulting uncertainty when transferring knowledge from VI-PANNs to other downstream acoustic classification tasks using the ESC-50, UrbanSound8K, and DCASE2013 datasets. We demonstrate, for the first time, that it is possible to transfer calibrated uncertainty information along with knowledge from upstream tasks to enhance a model's capability to perform downstream tasks.
翻译:迁移学习(TL)是一种日益流行的深度学习(DL)模型训练方法,它通过将基础模型在多样化大规模数据集上训练获得的知识,迁移至可用领域或任务特定数据较少的下游任务中。现有文献中关于迁移学习技术与应用的研究十分丰富,但绝大多数研究采用确定性深度学习模型——这类模型通常未经过校准,且无法在预测中传达认知不确定性(模型不确定性)的度量。与确定性模型不同,贝叶斯深度学习(BDL)模型通常校准良好,能够提供预测的认知不确定性信息,并具备竞争性的预测性能。本研究提出变分推断预训练音频神经网络(VI-PANNs),该模型是广受欢迎的ResNet-54架构的变分推断变体,并在大规模音频事件检测数据集AudioSet上完成预训练。我们利用ESC-50、UrbanSound8K和DCASE2013数据集,评估了将知识从VI-PANNs迁移至其他下游声学分类任务时,所生成不确定性的质量。首次证明,将校准后的不确定性信息与上游任务知识共同迁移,能够增强模型执行下游任务的能力。