In this paper, we propose a model for bird sound event detection that focuses on a small number of training samples within the everyday long-tail distribution. As a result, we investigate bird sound detection using the few-shot learning paradigm. By integrating channel and spatial attention mechanisms, improved feature representations can be learned from few-shot training datasets. We develop a Metric Channel-Spatial Network model by incorporating a Channel Spatial Squeeze-Excitation block into the prototype network, combining it with these attention mechanisms. We evaluate the Metric Channel Spatial Network model on the DCASE 2022 Take5 dataset benchmark, achieving an F-measure of 66.84% and a PSDS of 58.98%. Our experiment demonstrates that the combination of channel and spatial attention mechanisms effectively enhances the performance of bird sound classification and detection.
翻译:本文提出了一种针对日常长尾分布中少量训练样本的鸟鸣声音事件检测模型。为此,我们采用少样本学习范式研究鸟鸣声检测。通过集成通道注意力和空间注意力机制,可从少样本训练数据集中学习到更优的特征表示。我们通过将通道空间压缩激励模块融入原型网络,并结合上述注意力机制,开发了度量通道-空间网络模型。在DCASE 2022 Take5数据集基准上对度量通道-空间网络模型进行了评估,取得了66.84%的F值和58.98%的PSDS。实验证明,通道注意力与空间注意力机制的融合有效提升了鸟鸣声分类与检测的性能。