Machine Listening focuses on developing technologies to extract relevant information from audio signals. A critical aspect of these projects is the acquisition and labeling of contextualized data, which is inherently complex and requires specific resources and strategies. Despite the availability of some audio datasets, many are unsuitable for commercial applications. The paper emphasizes the importance of Active Learning (AL) using expert labelers over crowdsourcing, which often lacks detailed insights into dataset structures. AL is an iterative process combining human labelers and AI models to optimize the labeling budget by intelligently selecting samples for human review. This approach addresses the challenge of handling large, constantly growing datasets that exceed available computational resources and memory. The paper presents a comprehensive data-centric framework for Machine Listening projects, detailing the configuration of recording nodes, database structure, and labeling budget optimization in resource-constrained scenarios. Applied to an industrial port in Valencia, Spain, the framework successfully labeled 6540 ten-second audio samples over five months with a small team, demonstrating its effectiveness and adaptability to various resource availability situations.
翻译:机器听觉致力于开发从音频信号中提取相关信息的技术。此类项目的关键环节在于情境化数据的采集与标注,这一过程本质复杂且需要特定资源与策略。尽管现有部分音频数据集,但多数并不适用于商业应用。本文强调采用专家标注的主动学习相较于众包模式的重要性,后者往往缺乏对数据集结构的深入洞察。主动学习是一种结合人类标注者与AI模型的迭代过程,通过智能选择需人工复核的样本来优化标注预算。该方法解决了处理规模庞大且持续增长、超出可用计算资源与内存的数据集的挑战。本文提出了一套以数据为中心的机器听觉项目综合框架,详细阐述了在资源受限场景下录音节点配置、数据库结构设计及标注预算优化的实施方案。该框架应用于西班牙瓦伦西亚的工业港口,在五个月内由小型团队成功标注了6540个十秒音频样本,证明了其有效性及对不同资源条件的适应能力。