End-to-end automatic speech recognition (E2E-ASR) has the potential to improve performance, but a specific issue that needs to be addressed is the difficulty it has in handling enharmonic words: named entities (NEs) with the same pronunciation and part of speech that are spelled differently. This often occurs with Japanese personal names that have the same pronunciation but different Kanji characters. Since such NE words tend to be important keywords, ASR easily loses user trust if it misrecognizes them. To solve these problems, this paper proposes a novel retraining-free customized method for E2E-ASRs based on a named-entity-aware E2E-ASR model and phoneme similarity estimation. Experimental results show that the proposed method improves the target NE character error rate by 35.7% on average relative to the conventional E2E-ASR model when selecting personal names as a target NE.
翻译:端到端自动语音识别(E2E-ASR)具有提升性能的潜力,但处理同音异形词这一特殊问题亟待解决:即发音和词性相同但拼写不同的命名实体(NEs)。这在具有相同发音但汉字不同的日语人名中尤为常见。由于此类NE词汇往往为关键性关键词,若自动语音识别出现误识别,极易导致用户信任度下降。为解决上述问题,本文提出一种新颖的基于命名实体感知E2E-ASR模型与音素相似度估计的无重训练定制化方法。实验结果表明,当选择人名作为目标NE时,所提方法相较于传统E2E-ASR模型,使目标NE字符错误率平均降低35.7%。