End-to-end (E2E) approaches to keyword search (KWS) are considerably simpler in terms of training and indexing complexity when compared to approaches which use the output of automatic speech recognition (ASR) systems. This simplification however has drawbacks due to the loss of modularity. In particular, where ASR-based KWS systems can benefit from external unpaired text via a language model, current formulations of E2E KWS systems have no such mechanism. Therefore, in this paper, we propose a multitask training objective which allows unpaired text to be integrated into E2E KWS without complicating indexing and search. In addition to training an E2E KWS model to retrieve text queries from spoken documents, we jointly train it to retrieve text queries from masked written documents. We show empirically that this approach can effectively leverage unpaired text for KWS, with significant improvements in search performance across a wide variety of languages. We conduct analysis which indicates that these improvements are achieved because the proposed method improves document representations for words in the unpaired text. Finally, we show that the proposed method can be used for domain adaptation in settings where in-domain paired data is scarce or nonexistent.
翻译:端到端(E2E)关键词搜索(KWS)方法在训练和索引复杂度方面,相较于使用自动语音识别(ASR)系统输出的方法显著简化。然而,这种简化因模块化程度的降低而存在缺陷。具体而言,基于ASR的KWS系统能够通过语言模型利用外部非配对文本,而当前E2E KWS系统的构建方式缺乏此类机制。为此,本文提出一种多任务训练目标,使得非配对文本能够融入E2E KWS系统,且不增加索引与搜索的复杂性。该方法在训练E2E KWS模型从口语文档中检索文本查询的同时,联合训练其从掩码书面文档中检索文本查询。实验表明,本方法能有效利用非配对文本提升KWS性能,在多种语言中均实现搜索性能的显著提升。分析表明,这些改进源于所提方法优化了非配对文本中词汇的文档表征。最后,我们证明在领域内配对数据稀缺或缺失的场景下,本方法可用于领域自适应。