This research addresses the challenge of developing speech applications for zero-resource languages that lack labelled data. It specifically uses acoustic word embedding (AWE) -- fixed-dimensional representations of variable-duration speech segments -- employing multilingual transfer, where labelled data from several well-resourced languages are used for pertaining. The study introduces a new neural network that outperforms existing AWE models on zero-resource languages. It explores the impact of the choice of well-resourced languages. AWEs are applied to a keyword-spotting system for hate speech detection in Swahili radio broadcasts, demonstrating robustness in real-world scenarios. Additionally, novel semantic AWE models improve semantic query-by-example search.
翻译:本研究针对缺乏标注数据的零资源语言语音应用开发所面临的挑战。具体采用声学词嵌入(Acoustic Word Embedding, AWE)——即对可变时长语音片段进行固定维度表征——并运用多语言迁移方法,利用来自多种资源丰富语言的标注数据进行预训练。本研究提出了一种新型神经网络,在零资源语言上表现优于现有AWE模型,并深入探讨了资源丰富语言选择的影响。将AWE应用于斯瓦希里语广播节目中仇恨言论检测的关键词识别系统,展示了实际场景下的鲁棒性。此外,新型语义AWE模型提升了基于示例的语义查询检索性能。