Advances in AIGC technologies have enabled the synthesis of highly realistic audio deepfakes capable of deceiving human auditory perception. Although numerous audio deepfake detection (ADD) methods have been developed, most rely on local temporal/spectral features or pairwise relations, overlooking high-order interactions (HOIs). HOIs capture discriminative patterns that emerge from multiple feature components beyond their individual contributions. We propose HyperPotter, a hypergraph-based framework designed to capture high-order relations associated with synergistic patterns through clustering-based hyperedges with class-aware prototype initialization. Extensive experiments on 13 test sets show that HyperPotter improves over the baseline on 11 sets, yielding an average relative EER reduction of 12.68\% across all test sets and 22.15\% on the improved sets. These results demonstrate strong cross-scenario generalization, while also revealing robustness limits under severe codec or channel distortion.
翻译:AIGC技術的進展使得合成高度逼真、能夠欺騙人類聽覺感知的音頻深度偽造成為可能。儘管已開發出多種音頻深度偽造檢測方法,但大多數方法依賴局部時間/頻譜特徵或成對關係,忽略了高階交互作用。高階交互作用捕捉源自多個特徵組件、超越其個別貢獻的判別性模式。我們提出HyperPotter,這是一個基於超圖的框架,旨在通過基於聚類的超邊與類感知原型初始化,捕捉與協同模式相關的高階關係。在13個測試集上的廣泛實驗表明,HyperPotter在11個測試集上優於基線,在所有測試集上平均相對等錯誤率降低12.68%,在改進的測試集上降低22.15%。這些結果展示了強大的跨場景泛化能力,同時也揭示了在嚴重編解碼器或通道失真下的魯棒性限制。