Over the past few years, self-supervised learned speech representations have emerged as fruitful replacements for conventional surface representations when solving Spoken Language Understanding (SLU) tasks. Simultaneously, multilingual models trained on massive textual data were introduced to encode language agnostic semantics. Recently, the SAMU-XLSR approach introduced a way to make profit from such textual models to enrich multilingual speech representations with language agnostic semantics. By aiming for better semantic extraction on a challenging Spoken Language Understanding task and in consideration with computation costs, this study investigates a specific in-domain semantic enrichment of the SAMU-XLSR model by specializing it on a small amount of transcribed data from the downstream task. In addition, we show the benefits of the use of same-domain French and Italian benchmarks for low-resource language portability and explore cross-domain capacities of the enriched SAMU-XLSR.
翻译:在过去几年中,自监督学习的语音表示在解决口语理解任务时,已逐渐成为传统表面表示的有效替代方案。同时,基于海量文本数据训练的多语言模型被引入,用于编码语言无关的语义信息。近期,SAMU-XLSR方法提出了一种利用此类文本模型,以语言无关的语义信息增强多语言语音表示的途径。本研究旨在提升一项具有挑战性的口语理解任务中的语义提取效果,并兼顾计算成本,通过在下游任务的少量转录数据上进行专门化训练,探索了针对SAMU-XLSR模型的特定领域语义增强。此外,我们展示了使用同领域的法语和意大利语基准测试对低资源语言可移植性的益处,并考察了增强型SAMU-XLSR的跨领域能力。