In this work we propose an audio recording segmentation method based on an adaptive change point detection (A-CPD) for machine guided weak label annotation of audio recording segments. The goal is to maximize the amount of information gained about the temporal activation's of the target sounds. For each unlabeled audio recording, we use a prediction model to derive a probability curve used to guide annotation. The prediction model is initially pre-trained on available annotated sound event data with classes that are disjoint from the classes in the unlabeled dataset. The prediction model then gradually adapts to the annotations provided by the annotator in an active learning loop. The queries used to guide the weak label annotator towards strong labels are derived using change point detection on these probabilities. We show that it is possible to derive strong labels of high quality even with a limited annotation budget, and show favorable results for A-CPD when compared to two baseline query strategies.
翻译:本文提出了一种基于自适应变点检测(A-CPD)的音频分段方法,用于机器引导的弱标签标注。该方法旨在最大化目标声音时间激活信息获取量。对于每个未标注音频片段,我们利用预测模型生成概率曲线以指导标注过程。该预测模型初始阶段使用与待标注数据类别互斥的已标注声音事件数据进行预训练,随后通过主动学习循环逐步适应标注员提供的注释。基于概率曲线的变点检测结果,我们生成用于引导弱标签标注员获取强标签的查询策略。实验证明,即使在有限标注预算下,该方法仍能生成高质量的强标签,且与两种基线查询策略相比,A-CPD方法取得了更优的结果。