In a sentence, certain words are critical for its semantic. Among them, named entities (NEs) are notoriously challenging for neural models. Despite their importance, their accurate handling has been neglected in speech-to-text (S2T) translation research, and recent work has shown that S2T models perform poorly for locations and notably person names, whose spelling is challenging unless known in advance. In this work, we explore how to leverage dictionaries of NEs known to likely appear in a given context to improve S2T model outputs. Our experiments show that we can reliably detect NEs likely present in an utterance starting from S2T encoder outputs. Indeed, we demonstrate that the current detection quality is sufficient to improve NE accuracy in the translation with a 31% reduction in person name errors.
翻译:在一句话中,某些词语对其语义至关重要。其中,命名实体(NEs)对神经模型而言尤为棘手。尽管命名实体具有重要性,但在语音到文本(S2T)翻译研究中,对其准确处理却一直被忽视,且近期研究表明,S2T模型在地名方面表现不佳,尤其在人物名称上更是如此——若未事先知晓其拼写,则处理难度极大。本工作中,我们探索如何利用已知可能出现在特定语境中的命名实体词典来改进S2T模型输出。实验表明,我们能够可靠地从S2T编码器输出中检测出话语中可能存在的命名实体。事实上,我们证实现有检测质量足以提升翻译中命名实体的准确性,使人物名称错误减少31%。