Perch is a performant pre-trained model for bioacoustics. It was trained in supervised fashion, providing both off-the-shelf classification scores for thousands of vocalizing species as well as strong embeddings for transfer learning. In this new release, Perch 2.0, we expand from training exclusively on avian species to a large multi-taxa dataset. The model is trained with self-distillation using a prototype-learning classifier as well as a new source-prediction training criterion. Perch 2.0 obtains state-of-the-art performance on the BirdSet and BEANS benchmarks. It also outperforms specialized marine models on marine transfer learning tasks, despite having almost no marine training data. We present hypotheses as to why fine-grained species classification is a particularly robust pre-training task for bioacoustics.
翻译:Perch 是一个用于生物声学的高性能预训练模型。该模型通过监督学习方式训练,可为数千种发声物种提供开箱即用的分类分数,并为迁移学习提供强大的嵌入表示。在此新版本 Perch 2.0 中,我们将训练数据从单一的鸟类物种扩展至大型多类群数据集。模型采用自蒸馏方法进行训练,结合了原型学习分类器与新的声源预测训练准则。Perch 2.0 在 BirdSet 和 BEANS 基准测试中取得了最先进的性能。尽管几乎未使用海洋生物训练数据,该模型在海洋迁移学习任务上的表现仍优于专门的海洋声学模型。我们提出了关于为何细粒度物种分类是生物声学中特别稳健的预训练任务的假设。