Accurate transcription of proper names and technical terms is particularly important in speech-to-text applications for business conversations. These words, which are essential to understanding the conversation, are often rare and therefore likely to be under-represented in text and audio training data, creating a significant challenge in this domain. We present a two-step keyword boosting mechanism that successfully works on normalized unigrams and n-grams rather than just single tokens, which eliminates missing hits issues with boosting raw targets. In addition, we show how adjusting the boosting weight logic avoids over-boosting multi-token keywords. This improves our keyword recognition rate by 26% relative on our proprietary in-domain dataset and 2% on LibriSpeech. This method is particularly useful on targets that involve non-alphabetic characters or have non-standard pronunciations.
翻译:在商务对话的语音转文本应用中,准确转录专有名词和技术术语尤为重要。这些对理解对话至关重要的词汇通常较为罕见,因此在文本和音频训练数据中可能代表性不足,成为该领域的一项重大挑战。我们提出了一种两步关键词增强机制,该机制成功作用于归一化的unigram和n-gram(而非仅单个词元),从而消除了对原始目标进行增强时出现的漏检问题。此外,我们还展示了如何通过调整增强权重逻辑来避免对多词元关键词的过度增强。该方法使我们专有的领域内数据集关键词识别率相对提升了26%,在LibriSpeech上提升了2%。该方法对于包含非字母字符或具有非标准发音的目标尤为有效。