Voice authentication systems deployed at the network edge face dual threats: a) sophisticated deepfake synthesis attacks and b) control-plane poisoning in distributed federated learning protocols. We present a framework coupling physics-guided deepfake detection with uncertainty-aware in edge learning. The framework fuses interpretable physics features modeling vocal tract dynamics with representations coming from a self-supervised learning module. The representations are then processed via a Multi-Modal Ensemble Architecture, followed by a Bayesian ensemble providing uncertainty estimates. Incorporating physics-based characteristics evaluations and uncertainty estimates of audio samples allows our proposed framework to remain robust to both advanced deepfake attacks and sophisticated control-plane poisoning, addressing the complete threat model for networked voice authentication.
翻译:部署在网络边缘的语音认证系统面临双重威胁:a) 复杂的深度伪造合成攻击;b) 分布式联邦学习协议中的控制平面投毒。我们提出了一种框架,将物理引导的深度伪造检测与边缘学习中的不确定性感知相结合。该框架融合了建模声道动态的可解释物理特征与来自自监督学习模块的表征。这些表征随后通过多模态集成架构进行处理,并由贝叶斯集成提供不确定性估计。通过纳入音频样本的基于物理特性评估和不确定性估计,我们提出的框架能够对高级深度伪造攻击和复杂的控制平面投毒保持鲁棒性,从而应对网络化语音认证的完整威胁模型。