In the contemporary digital age, the proliferation of deepfakes presents a formidable challenge to the sanctity of information dissemination. Audio deepfakes, in particular, can be deceptively realistic, posing significant risks in misinformation campaigns. To address this threat, we introduce the Multi-Feature Audio Authenticity Network (MFAAN), an advanced architecture tailored for the detection of fabricated audio content. MFAAN incorporates multiple parallel paths designed to harness the strengths of different audio representations, including Mel-frequency cepstral coefficients (MFCC), linear-frequency cepstral coefficients (LFCC), and Chroma Short Time Fourier Transform (Chroma-STFT). By synergistically fusing these features, MFAAN achieves a nuanced understanding of audio content, facilitating robust differentiation between genuine and manipulated recordings. Preliminary evaluations of MFAAN on two benchmark datasets, 'In-the-Wild' Audio Deepfake Data and The Fake-or-Real Dataset, demonstrate its superior performance, achieving accuracies of 98.93% and 94.47% respectively. Such results not only underscore the efficacy of MFAAN but also highlight its potential as a pivotal tool in the ongoing battle against deepfake audio content.
翻译:在当代数字时代,深度伪造技术的蔓延对信息传播的真实性构成了严峻挑战。尤其是音频深度伪造,因其高度逼真的欺骗性,在虚假信息传播中带来重大风险。为应对这一威胁,我们提出了多特征音频真实性网络(MFAAN),这是一种专为检测伪造音频内容而设计的先进架构。MFAAN包含多条并行路径,旨在充分利用不同音频表示的优势,包括梅尔频率倒谱系数(MFCC)、线性频率倒谱系数(LFCC)以及色度短时傅里叶变换(Chroma-STFT)。通过协同融合这些特征,MFAAN实现了对音频内容的精细理解,从而能够有效区分真实录音与篡改录音。在两个基准数据集——"野外"音频深度伪造数据和真伪数据集上的初步评估显示,MFAAN分别取得了98.93%和94.47%的准确率。这些结果不仅证实了MFAAN的有效性,更凸显了其作为对抗音频深度伪造内容的关键工具的潜在价值。