Large ASR models can inadvertently leak sensitive information, which can be mitigated by formal privacy measures like differential privacy (DP). However, traditional DP training is computationally expensive, and can hurt model performance. Our study explores DP parameter-efficient fine-tuning as a way to mitigate privacy risks with smaller computation and performance costs for ASR models. Through extensive experimentation and progressive optimization, we achieve 4.6%/8.1% word error rate on LibriSpeech clean/other test-sets, setting a new performance benchmark while maintaining (10, 3.52e-6)-DP in fine-tuning a large ASR model with over 600M parameters.
翻译:大型自动语音识别(ASR)模型可能无意中泄露敏感信息,这一问题可通过差分隐私(DP)等正式隐私保护措施来缓解。然而,传统的差分隐私训练计算成本高昂,且可能损害模型性能。本研究探索差分隐私参数高效微调方法,旨在以更小的计算和性能代价降低ASR模型的隐私风险。通过大量实验和渐进式优化,我们在LibriSpeech干净/其他测试集上实现了4.6%/8.1%的词错误率,在保持(10, 3.52e-6)-差分隐私的前提下对参数量超过6亿的大型ASR模型进行微调,创造了新的性能基准。