Few-shot keyword spotting (KWS) systems often utilize a sliding window of fixed size. Because of the varying lengths of different keywords or their spoken instances, choosing the right window size is a problem: A window should be long enough to contain all necessary information needed to recognize a keyword but a longer window may contain irrelevant information such as multiple words or noise and thus makes it difficult to reliably detect on- and offsets of keywords. In this work, TempAdaCos, an angular margin loss for obtaining embeddings with temporal structure, that can be used to detect keywords with dynamic time warping is proposed. In experiments conducted on KWS-DailyTalk, a few-shot keyword spotting (KWS) dataset presented in this work, it is shown that using these embeddings outperforms using other representations or a sliding window. Furthermore, it is shown that using time-reversed segments of the keywords while training the system improves the performance.
翻译:少样本关键词识别系统常采用固定大小的滑动窗口。由于不同关键词或其语音实例的长度差异,选择合适的窗口尺寸存在困难:窗口需足够长以包含识别关键词所需的全部信息,但过长窗口可能包含无关信息(如多个单词或噪声),从而难以可靠检测关键词的起止点。本文提出TempAdaCos——一种用于获取具有时序结构嵌入的角边缘损失函数,可结合动态时间规整实现关键词检测。在本文提出的少样本关键词数据集KWS-DailyTalk上的实验表明,使用此类嵌入的效果优于其他表征或滑动窗口方法。此外,实验证明在系统训练过程中使用关键词的时间反转片段可提升性能。