Transfer learning is a crucial concept within deep learning that allows artificial neural networks to benefit from a large pre-training data basis when confronted with a task of limited data. Despite its ubiquitous use and clear benefits, there are still many open questions regarding the inner workings of transfer learning and, in particular, regarding the understanding of when and how well it works. To that extent, we perform a rigorous study focusing on audio-to-audio transfer learning, in which we pre-train various model states on (ontology-based) subsets of AudioSet and fine-tune them on three computer audition tasks, namely acoustic scene recognition, bird activity recognition, and speech command recognition. We report that increasing the number of samples and classes in the pre-training data both have a positive impact on transfer learning. This is, however, generally surpassed by similarity between pre-training and the downstream task, which can lead the model to learn comparable features.
翻译:迁移学习是深度学习中的一个关键概念,它使人工神经网络能在面对数据有限的任务时,从大规模预训练数据集中受益。尽管其应用广泛且效果显著,但关于迁移学习的内部机制,特别是何时及如何发挥作用的理解,仍有许多悬而未决的问题。为此,我们开展了一项严谨的研究,聚焦于音频到音频的迁移学习。我们在基于AudioSet(本体框架)的子集上预训练多种模型状态,并在三项计算机听觉任务(即声场景识别、鸟类活动识别和语音命令识别)上进行微调。我们报告指出,预训练数据中样本数量和类别的增加均对迁移学习有积极影响。然而,这种影响通常被预训练与下游任务之间的相似性所超越,该相似性可引导模型学习到可比的特征。