Advances in AIGC technologies have enabled the synthesis of highly realistic audio deepfakes capable of deceiving human auditory perception. Although numerous audio deepfake detection (ADD) methods have been developed, most rely on local temporal/spectral features or pairwise relations, overlooking high-order interactions (HOIs). HOIs capture discriminative patterns that emerge from multiple feature components beyond their individual contributions. We propose HyperPotter, a hypergraph-based framework that explicitly models these synergistic HOIs through clustering-based hyperedges with class-aware prototype initialization. Extensive experiments demonstrate that HyperPotter surpasses its baseline by an average relative gain of 22.15% across 11 datasets and outperforms state-of-the-art methods by 13.96% on 4 challenging cross-domain datasets, demonstrating superior generalization to diverse attacks and speakers.
翻译:AIGC技术的进步使得能够合成高度逼真的音频深度伪造内容,足以欺骗人类听觉感知。尽管已开发出众多音频深度伪造检测方法,但大多数依赖于局部时域/频域特征或成对关系,忽略了高阶交互作用。高阶交互能够捕捉超越单个特征贡献的、由多个特征组件协同产生的判别性模式。我们提出HyperPotter,一种基于超图的框架,通过基于聚类的超边与类感知原型初始化,显式建模这些协同高阶交互。大量实验表明,HyperPotter在11个数据集上相对基线平均提升22.15%,并在4个具有挑战性的跨域数据集上超越现有最优方法13.96%,展现出对多样化攻击和说话人的卓越泛化能力。