In this work, we devise a parameter-efficient solution to bring differential privacy (DP) guarantees into adaptation of a cross-lingual speech classifier. We investigate a new frozen pre-trained adaptation framework for DP-preserving speech modeling without full model fine-tuning. First, we introduce a noisy teacher-student ensemble into a conventional adaptation scheme leveraging a frozen pre-trained acoustic model and attain superior performance than DP-based stochastic gradient descent (DPSGD). Next, we insert residual adapters (RA) between layers of the frozen pre-trained acoustic model. The RAs reduce training cost and time significantly with a negligible performance drop. Evaluated on the open-access Multilingual Spoken Words (MLSW) dataset, our solution reduces the number of trainable parameters by 97.5% using the RAs with only a 4% performance drop with respect to fine-tuning the cross-lingual speech classifier while preserving DP guarantees.
翻译:本研究提出了一种参数高效的解决方案,将差分隐私(DP)保证引入跨语言语音分类器的适配过程中。我们探索了一种新的冻结预训练适配框架,用于实现差分隐私保护的语音建模,而无需完整的模型微调。首先,我们利用冻结的预训练声学模型,将含噪师生集成方法引入传统适配方案,获得了优于基于差分隐私的随机梯度下降(DPSGD)的表现。接着,我们在冻结的预训练声学模型各层之间插入了残差适配器(RA)。残差适配器以可忽略的性能下降为代价,显著降低了训练成本和时间。在开源的多语言口语词汇(MLSW)数据集上的评估表明,我们的方案使用残差适配器将可训练参数减少了97.5%,同时仅出现4%的性能下降(相较于微调跨语言语音分类器),并保留了差分隐私保证。