The advances in generative AI have enabled the creation of synthetic audio which is perceptually indistinguishable from real, genuine audio. Although this stellar progress enables many positive applications, it also raises risks of misuse, such as for impersonation, disinformation and fraud. Despite a growing number of open-source fake audio detection codes released through numerous challenges and initiatives, most are tailored to specific competitions, datasets or models. A standardized and unified toolkit that supports the fair benchmarking and comparison of competing solutions with not just common databases, protocols, metrics, but also a shared codebase, is missing. To address this, we propose WeDefense, the first open-source toolkit to support both fake audio detection and localization. Beyond model training, WeDefense emphasizes critical yet often overlooked components: flexible input and augmentation, calibration, score fusion, standardized evaluation metrics, and analysis tools for deeper understanding and interpretation. The toolkit is publicly available at https://github.com/zlin0/wedefense with interactive demos for fake audio detection and localization.
翻译:生成式人工智能的进展使得合成音频在感知上已与真实音频难以区分。尽管这一卓越进展催生了众多积极应用,但也带来了滥用风险,例如身份冒充、虚假信息传播和欺诈行为。尽管通过各类挑战和倡议已发布越来越多的开源伪造音频检测代码,但大多数代码仅针对特定竞赛、数据集或模型定制。目前尚缺乏一个标准化、统一的工具包,该工具包不仅应支持使用通用数据库、协议和指标对竞争性解决方案进行公平基准测试与比较,还应提供共享的代码库。为此,我们提出了WeDefense——首个同时支持伪造音频检测与定位的开源工具包。除模型训练外,WeDefense强调关键但常被忽视的组件:灵活的输入与数据增强、校准、分数融合、标准化评估指标,以及用于深入理解和结果分析的工具。该工具包已在https://github.com/zlin0/wedefense公开提供,并包含伪造音频检测与定位的交互式演示。