ICSD: An Open-source Dataset for Infant Cry and Snoring Detection

The detection and analysis of infant cry and snoring events are crucial tasks within the field of audio signal processing. While existing datasets for general sound event detection are plentiful, they often fall short in providing sufficient, strongly labeled data specific to infant cries and snoring. To provide a benchmark dataset and thus foster the research of infant cry and snoring detection, this paper introduces the Infant Cry and Snoring Detection (ICSD) dataset, a novel, publicly available dataset specially designed for ICSD tasks. The ICSD comprises three types of subsets: a real strongly labeled subset with event-based labels annotated manually, a weakly labeled subset with only clip-level event annotations, and a synthetic subset generated and labeled with strong annotations. This paper provides a detailed description of the ICSD creation process, including the challenges encountered and the solutions adopted. We offer a comprehensive characterization of the dataset, discussing its limitations and key factors for ICSD usage. Additionally, we conduct extensive experiments on the ICSD dataset to establish baseline systems and offer insights into the main factors when using this dataset for ICSD research. Our goal is to develop a dataset that will be widely adopted by the community as a new open benchmark for future ICSD research.

翻译：婴儿啼哭与鼾声事件的检测与分析是音频信号处理领域的关键任务。尽管现有的通用声音事件检测数据集数量众多，但往往无法提供充足且具有强标注的、专门针对婴儿啼哭与鼾声的数据。为提供一个基准数据集，从而推动婴儿啼哭与鼾声检测的研究，本文介绍了婴儿啼哭与鼾声检测（ICSD）数据集——一个新颖的、公开可用的、专为ICSD任务设计的数据集。ICSD包含三种类型的子集：一个具有人工标注的事件级标签的真实强标注子集，一个仅具有片段级事件标注的弱标注子集，以及一个生成并带有强标注的合成子集。本文详细描述了ICSD的创建过程，包括遇到的挑战和采取的解决方案。我们对数据集进行了全面的特征描述，讨论了其局限性以及使用ICSD时的关键因素。此外，我们在ICSD数据集上进行了广泛的实验，以建立基线系统，并为使用该数据集进行ICSD研究时的主要考量因素提供见解。我们的目标是开发一个能被学术界广泛采纳的数据集，作为未来ICSD研究的新开放基准。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日