Weakly-supervised learning has emerged as a promising approach to leverage limited labeled data in various domains by bridging the gap between fully supervised methods and unsupervised techniques. Acquisition of strong annotations for detecting sound events is prohibitively expensive, making weakly supervised learning a more cost-effective and broadly applicable alternative. In order to enhance the recognition rate of the learning of detection of weakly-supervised sound events, we introduce a Frame Pairwise Distance (FPD) loss branch, complemented with a minimal amount of synthesized data. The corresponding sampling and label processing strategies are also proposed. Two distinct distance metrics are employed to evaluate the proposed approach. Finally, the method is validated on the standard DCASE dataset. The obtained experimental results corroborated the efficacy of this approach.
翻译:弱监督学习作为一种有前景的方法,通过弥合全监督方法与无监督技术之间的差距,已在多个领域实现了对有限标注数据的利用。获取用于检测声音事件的强标注成本过高,因此弱监督学习成为一种更具成本效益且广泛适用的替代方案。为了提升弱监督声音事件检测的学习识别率,我们引入了一个帧级成对距离(FPD)损失分支,并辅以最小量合成数据。同时提出了相应的采样和标签处理策略。采用两种不同的距离度量来评估所提方法。最后,该方法在标准DCASE数据集上进行了验证。获得的实验结果证实了该方法的有效性。