Recent progress in audio generation models has made it possible to create highly realistic and immersive soundscapes, which are now widely used in film and virtual-reality-related applications. However, these audio generators also raise concerns about potential misuse, such as producing deceptive audio for fabricated videos or spreading misleading information. Therefore, it is essential to develop effective methods for detecting fake environmental sounds. Existing datasets for environmental sound deepfake detection (ESDD) remain limited in both scale and the diversity of sound categories they cover. To address this gap, we introduced EnvSDD, the first large-scale curated dataset designed for ESDD. Based on EnvSDD, we launched the ESDD Challenge, recognized as one of the ICASSP 2026 Grand Challenges. This paper presents an overview of the ESDD Challenge, including a detailed analysis of the challenge results.
翻译:音频生成模型的最新进展使得创建高度逼真和沉浸式声景成为可能,这些声景现已广泛应用于电影和虚拟现实相关领域。然而,这些音频生成器也引发了对其潜在滥用的担忧,例如为伪造视频制作欺骗性音频或传播误导性信息。因此,开发有效的环境声音伪造检测方法至关重要。现有的环境声音深度伪造检测数据集在规模和覆盖的声音类别多样性方面仍然有限。为弥补这一不足,我们推出了EnvSDD,这是首个为环境声音深度伪造检测设计的大规模精选数据集。基于EnvSDD,我们发起了环境声音深度伪造检测挑战赛,该赛事已被认定为ICASSP 2026重大挑战赛之一。本文概述了环境声音深度伪造检测挑战赛,包括对挑战赛结果的详细分析。