Fine-tuning generic ASR models with large-scale synthetic personal data can enhance the personalization of ASR models, but it introduces challenges in adapting to synthetic personal data without forgetting real knowledge, and in adapting to personal data without forgetting generic knowledge. Considering that the functionally invariant path (FIP) framework enables model adaptation while preserving prior knowledge, in this letter, we introduce FIP into synthetic-data-augmented personalized ASR models. However, the model still struggles to balance the learning of synthetic, personalized, and generic knowledge when applying FIP to train the model on all three types of data simultaneously. To decouple this learning process and further address the above two challenges, we integrate a gated parameter-isolation strategy into FIP and propose a knowledge-decoupled functionally invariant path (KDFIP) framework, which stores generic and personalized knowledge in separate modules and applies FIP to them sequentially. Specifically, KDFIP adapts the personalized module to synthetic and real personal data and the generic module to generic data. Both modules are updated along personalization-invariant paths, and their outputs are dynamically fused through a gating mechanism. With augmented synthetic data, KDFIP achieves a 29.38% relative character error rate reduction on target speakers and maintains comparable generalization performance to the unadapted ASR baseline.
翻译:利用大规模合成个人数据对通用ASR模型进行微调可增强模型的个性化能力,但会带来双重挑战:既要适应合成个人数据而不遗忘真实知识,又要适应个人数据而不丢失通用知识。考虑到功能不变路径框架能够在模型适应过程中保持先验知识,本文将该框架引入合成数据增强的个性化ASR模型。然而,当同时使用三类数据(合成、个性化、通用)通过FIP训练模型时,模型仍难以平衡三类知识的学习。为解耦学习过程并进一步解决上述挑战,我们将门控参数隔离策略整合至FIP,提出知识解耦的功能不变路径框架。该框架将通用知识与个性化知识分别存储于独立模块,并对其顺序应用FIP。具体而言,KDFIP使个性化模块适应合成及真实个人数据,通用模块适应通用数据。两个模块均沿个性化不变路径更新,其输出通过门控机制动态融合。在增强合成数据的支持下,KDFIP在目标说话人上实现了29.38%的相对字错误率下降,同时保持了与未适应ASR基线相当的泛化性能。