Current audio deepfake detection has achieved remarkable performance using diverse deep learning architectures such as ResNet, and has seen further improvements with the introduction of large models (LMs) like Wav2Vec. The success of large language models (LLMs) further demonstrates the benefits of scaling model parameters, but also highlights one bottleneck where performance gains are constrained by parameter counts. Simply stacking additional layers, as done in current LLMs, is computationally expensive and requires full retraining. Furthermore, existing low-rank adaptation methods are primarily applied to attention-based architectures, which limits their scope. Inspired by the neuronal plasticity observed in mammalian brains, we propose novel algorithms, dropin and further plasticity, that dynamically adjust the number of neurons in certain layers to flexibly modulate model parameters. We evaluate these algorithms on multiple architectures, including ResNet, Gated Recurrent Neural Networks, and Wav2Vec. Experimental results using the widely recognised ASVSpoof2019 LA, PA, and FakeorReal dataset demonstrate consistent improvements in computational efficiency with the dropin approach and a maximum of around 39% and 66% relative reduction in Equal Error Rate with the dropin and plasticity approach among these dataset, respectively. The code and supplementary material are available at Github link.
翻译:当前音频深度伪造检测已利用诸如ResNet等多种深度学习架构取得了显著性能,而Wav2Vec等大模型(LMs)的引入进一步带来了性能提升。大型语言模型(LLMs)的成功进一步证明了扩展模型参数规模的优势,但也凸显了一个瓶颈:性能增益受限于参数量。像现有LLMs那样简单地堆叠额外层,不仅计算成本高昂,且需完全重新训练。此外,现有的低秩适配方法主要应用于基于注意力的架构,限制了其适用范围。受哺乳动物大脑中观察到的神经可塑性启发,我们提出了新颖算法dropin(神经元注入)与塑性扩展(plasticity),能够动态调整特定层的神经元数量,从而灵活调控模型参数。我们在包括ResNet、门控循环神经网络和Wav2Vec在内的多种架构上评估了这些算法。采用广泛认可的ASVSpoof2019 LA、PA及FakeorReal数据集进行实验,结果表明:dropin方法持续提升了计算效率;而结合dropin与塑性扩展方法,在这些数据集上等错误率相对降低了约39%至66%(最大值)。代码与补充材料见GitHub链接。