EchoFake: A Replay-Aware Dataset for Practical Speech Deepfake Detection

The growing prevalence of speech deepfakes has raised serious concerns, particularly in real-world scenarios such as telephone fraud and identity theft. While many anti-spoofing systems have demonstrated promising performance on lab-generated synthetic speech, they often fail when confronted with physical replay attacks-a common and low-cost form of attack used in practical settings. Our experiments show that models trained on existing datasets exhibit severe performance degradation, with average accuracy dropping to 59.6% when evaluated on replayed audio. To bridge this gap, we present EchoFake, a comprehensive dataset comprising more than 120 hours of audio from over 13,000 speakers, featuring both cutting-edge zero-shot text-to-speech (TTS) speech and physical replay recordings collected under varied devices and real-world environmental settings. Additionally, we evaluate three baseline detection models and show that models trained on EchoFake achieve lower average EERs across datasets, indicating better generalization. By introducing more practical challenges relevant to real-world deployment, EchoFake offers a more realistic foundation for advancing spoofing detection methods.

翻译：随着语音深度伪造的日益普及，特别是在电话诈骗和身份盗窃等现实场景中，引发了严重关切。尽管许多反欺骗系统在处理实验室合成的语音时表现出令人鼓舞的性能，但在面对实际环境中常见且低成本的物理重放攻击时，它们往往失效。我们的实验表明，在现有数据集上训练的模型在评估重放音频时性能严重下降，平均准确率降至59.6%。为弥合这一差距，我们提出了EchoFake，一个包含来自超过13000个说话人的120多小时音频的综合数据集，该数据集既包含先进的零样本文本转语音（TTS）语音，也包含在不同设备和现实环境设置下收集的物理重放录音。此外，我们评估了三种基线检测模型，结果表明，在EchoFake上训练的模型在不同数据集上的平均等错误率（EER）更低，显示出更强的泛化能力。通过引入更多与现实部署相关的实际挑战，EchoFake为推进欺骗检测方法提供了更真实的基础。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

《深度伪造防御系统评估的系统性方法》

专知会员服务

15+阅读 · 3月16日

ACM Computing Surveys | 港大等基于可靠性视角的深度伪造检测综述，覆盖主流基准库、模型

专知会员服务

17+阅读 · 2025年1月12日

《用于语音取证和高超音速飞行器应用的机器学习》200页

专知会员服务

20+阅读 · 2024年3月28日

【AI安全系列】从deepfakes深度伪造技术看AI安全，53页ppt

专知会员服务

82+阅读 · 2023年6月27日