Advances in passive acoustic monitoring and machine learning have led to the procurement of vast datasets for computational bioacoustic research. Nevertheless, data scarcity is still an issue for rare and underrepresented species. This study investigates how meta-information can improve zero-shot audio classification, utilising bird species as an example case study due to the availability of rich and diverse meta-data. We investigate three different sources of metadata: textual bird sound descriptions encoded via (S)BERT, functional traits (AVONET), and bird life-history (BLH) characteristics. As audio features, we extract audio spectrogram transformer (AST) embeddings and project them to the dimension of the auxiliary information by adopting a single linear layer. Then, we employ the dot product as compatibility function and a standard zero-shot learning ranking hinge loss to determine the correct class. The best results are achieved by concatenating the AVONET and BLH features attaining a mean unweighted F1-score of .233 over five different test sets with 8 to 10 classes.
翻译:被动声学监测与机器学习的进步为计算生物声学研究带来了海量数据集。然而,对于稀有及代表性不足的物种,数据稀缺问题依然存在。本研究以鸟类物种为例(因其具有丰富多样的元数据),探讨元信息如何提升零样本音频分类性能。我们研究了三种不同的元数据来源:通过(S)BERT编码的文本化鸟类声音描述、功能性状(AVONET)以及鸟类生活史(BLH)特征。在音频特征方面,我们提取音频频谱图变换器(AST)嵌入向量,并通过单线性层将其投影至辅助信息的维度。随后,我们采用点积作为兼容性函数,并应用标准零样本学习排序铰链损失来确定正确类别。最佳结果通过拼接AVONET与BLH特征获得,在包含8至10个类别的五个不同测试集上取得了平均未加权F1分数0.233。