Unmanned Aerial Vehicles (UAVs) or drones, are increasingly used in search and rescue missions to detect human presence. Existing systems primarily leverage vision-based methods which are prone to fail under low-visibility or occlusion. Drone-based audio perception offers promise but suffers from extreme ego-noise that masks sounds indicating human presence. Existing datasets are either limited in diversity or synthetic, lacking real acoustic interactions, and there are no standardized setups for drone audition. To this end, we present DroneAudioset (The dataset is publicly available at https://huggingface.co/datasets/ahlab-drone-project/DroneAudioSet/ under the MIT license), a comprehensive drone audition dataset featuring 23.5 hours of annotated recordings, covering a wide range of signal-to-noise ratios (SNRs) from -57.2 dB to -2.5 dB, across various drone types, throttles, microphone configurations as well as environments. The dataset enables development and systematic evaluation of noise suppression and classification methods for human-presence detection under challenging conditions, while also informing practical design considerations for drone audition systems, such as microphone placement trade-offs, and development of drone noise-aware audio processing. This dataset is an important step towards enabling design and deployment of drone-audition systems.
翻译:无人机在搜索救援任务中正日益广泛地用于探测人类存在。现有系统主要依赖基于视觉的方法,这些方法在低能见度或遮挡条件下容易失效。基于无人机的音频感知技术展现出潜力,但受到极端自身噪声的干扰,这些噪声会掩盖指示人类存在的声音。现有数据集要么多样性有限,要么是合成的,缺乏真实的声学交互,且尚无标准化的无人机听觉系统设置。为此,我们提出了DroneAudioset(该数据集在MIT许可下公开于https://huggingface.co/datasets/ahlab-drone-project/DroneAudioSet/),这是一个全面的无人机听觉数据集,包含23.5小时带标注的录音,涵盖了从-57.2 dB到-2.5 dB的广泛信噪比范围,涉及多种无人机类型、油门状态、麦克风配置以及环境条件。该数据集支持在挑战性条件下开发和系统评估用于人类存在检测的噪声抑制与分类方法,同时为无人机听觉系统的实际设计考量(如麦克风布局的权衡)以及无人机噪声感知音频处理技术的开发提供参考。该数据集是推动无人机听觉系统设计与部署的重要一步。