The detection and analysis of infant cry and snoring events are crucial tasks within the field of audio signal processing. While existing datasets for general sound event detection are plentiful, they often fall short in providing sufficient, strongly labeled data specific to infant cries and snoring. To provide a benchmark dataset and thus foster the research of infant cry and snoring detection, this paper introduces the Infant Cry and Snoring Detection (ICSD) dataset, a novel, publicly available dataset specially designed for ICSD tasks. The ICSD comprises three types of subsets: a real strongly labeled subset with event-based labels annotated manually, a weakly labeled subset with only clip-level event annotations, and a synthetic subset generated and labeled with strong annotations. This paper provides a detailed description of the ICSD creation process, including the challenges encountered and the solutions adopted. We offer a comprehensive characterization of the dataset, discussing its limitations and key factors for ICSD usage. Additionally, we conduct extensive experiments on the ICSD dataset to establish baseline systems and offer insights into the main factors when using this dataset for ICSD research. Our goal is to develop a dataset that will be widely adopted by the community as a new open benchmark for future ICSD research.
翻译:婴儿啼哭与鼾声事件的检测与分析是音频信号处理领域的关键任务。尽管现有的通用声音事件检测数据集数量众多,但往往无法提供充足且具有强标注的、专门针对婴儿啼哭与鼾声的数据。为提供一个基准数据集,从而推动婴儿啼哭与鼾声检测的研究,本文介绍了婴儿啼哭与鼾声检测(ICSD)数据集——一个新颖的、公开可用的、专为ICSD任务设计的数据集。ICSD包含三种类型的子集:一个具有人工标注的事件级标签的真实强标注子集,一个仅具有片段级事件标注的弱标注子集,以及一个生成并带有强标注的合成子集。本文详细描述了ICSD的创建过程,包括遇到的挑战和采取的解决方案。我们对数据集进行了全面的特征描述,讨论了其局限性以及使用ICSD时的关键因素。此外,我们在ICSD数据集上进行了广泛的实验,以建立基线系统,并为使用该数据集进行ICSD研究时的主要考量因素提供见解。我们的目标是开发一个能被学术界广泛采纳的数据集,作为未来ICSD研究的新开放基准。