ASR correction methods have predominantly focused on general datasets and have not effectively utilized Pinyin information, unique to the Chinese language. In this study, we address this gap by proposing a Pinyin Enhanced Rephrasing Language Model (PERL), specifically designed for N-best correction scenarios. Additionally, we implement a length predictor module to address the variable-length problem. We conduct experiments on the Aishell-1 dataset and our newly proposed DoAD dataset. The results show that our approach outperforms baseline methods, achieving a 29.11% reduction in Character Error Rate (CER) on Aishell-1 and around 70% CER reduction on domain-specific datasets. Furthermore, our approach leverages Pinyin similarity at the token level, providing an advantage over baselines and leading to superior performance.
翻译:现有的ASR纠错方法主要集中于通用数据集,未能有效利用中文特有的拼音信息。本研究针对这一不足,提出了一种专为N-best纠错场景设计的拼音增强重述语言模型(PERL)。此外,我们引入长度预测模块以处理变长问题。我们在Aishell-1数据集及新构建的DoAD数据集上进行实验。结果表明,该方法优于基线模型,在Aishell-1数据集上实现了29.11%的字错误率(CER)下降,在领域特定数据集上实现了约70%的CER下降。进一步分析显示,本方法在词元级别利用拼音相似性,相比基线模型具有显著优势,从而获得更优性能。