Taking advantage of the structure of large datasets to pre-train Deep Learning models is a promising strategy to decrease the need for supervised data. Self-supervised learning methods, such as contrastive and its variation are a promising way towards obtaining better representations in many Deep Learning applications. Soundscape ecology is one application in which annotations are expensive and scarce, therefore deserving investigation to approximate methods that do not require annotations to those that rely on supervision. Our study involves the use of the methods Barlow Twins and VICReg to pre-train different models with the same small dataset with sound patterns of bird and anuran species. In a downstream task to classify those animal species, the models obtained results close to supervised ones, pre-trained in large generic datasets, and fine-tuned with the same task.
翻译:利用大规模数据集的结构对深度学习模型进行预训练是减少对监督数据需求的一种有前景策略。自监督学习方法(如对比学习及其变体)是在许多深度学习应用中获取更好表征的有效途径。声景生态学是一个标注成本高且数据稀少的应用领域,因此值得研究无需标注的方法与依赖监督的方法之间的近似性。本研究采用Barlow Twins和VICReg方法,使用包含鸟类与蛙类声音模式的同一小规模数据集对多个模型进行预训练。在针对这些动物物种进行分类的下游任务中,模型取得了接近监督学习的结果,且模型预训练于大型通用数据集并通过相同任务进行微调。