Data collection and annotation is a laborious, time-consuming prerequisite for supervised machine learning tasks. Online Active Learning (OAL) is a paradigm that addresses this issue by simultaneously minimizing the amount of annotation required to train a classifier and adapting to changes in the data over the duration of the data collection process. Prior work has indicated that fluctuating class distributions and data drift are still common problems for OAL. This work presents new loss functions that address these challenges when OAL is applied to Sound Event Detection (SED). Experimental results from the SONYC dataset and two Voice-Type Discrimination (VTD) corpora indicate that OAL can reduce the time and effort required to train SED classifiers by a factor of 5 for SONYC, and that the new methods presented here successfully resolve issues present in existing OAL methods.
翻译:数据收集与标注是监督式机器学习任务中一项耗时费力的前提工作。在线主动学习(OAL)通过同时最小化训练分类器所需的标注量和适应数据收集过程中数据的变化,从而解决这一问题。先前的研究表明,类别分布波动和数据漂移仍是OAL面临的常见问题。本研究提出了新的损失函数,以应对将OAL应用于声音事件检测(SED)时的这些挑战。基于SONYC数据集和两个语音类型判别(VTD)语料库的实验结果表明,对于SONYC数据集,OAL可将训练SED分类器所需的时间和精力减少至原来的五分之一,且本文提出的新方法成功解决了现有OAL方法中存在的问题。