Spoken Language Understanding (SLU) models are a core component of voice assistants (VA), such as Alexa, Bixby, and Google Assistant. In this paper, we introduce a pipeline designed to extend SLU systems to new languages, utilizing Large Language Models (LLMs) that we fine-tune for machine translation of slot-annotated SLU training data. Our approach improved on the MultiATIS++ benchmark, a primary multi-language SLU dataset, in the cloud scenario using an mBERT model. Specifically, we saw an improvement in the Overall Accuracy metric: from 53% to 62.18%, compared to the existing state-of-the-art method, Fine and Coarse-grained Multi-Task Learning Framework (FC-MTLF). In the on-device scenario (tiny and not pretrained SLU), our method improved the Overall Accuracy from 5.31% to 22.06% over the baseline Global-Local Contrastive Learning Framework (GL-CLeF) method. Contrary to both FC-MTLF and GL-CLeF, our LLM-based machine translation does not require changes in the production architecture of SLU. Additionally, our pipeline is slot-type independent: it does not require any slot definitions or examples.
翻译:口语理解(SLU)模型是Alexa、Bixby和Google Assistant等语音助手(VA)的核心组成部分。本文提出了一种基于大语言模型(LLM)的流水线架构,通过微调LLM实现带槽位标注的SLU训练数据的机器翻译,从而将SLU系统扩展到新语言。在云端场景下,本方法使用mBERT模型改进了多语言SLU主要数据集MultiATIS++的基准性能。具体而言,综合准确率指标从现有最先进方法(细粒度与粗粒度多任务学习框架,FC-MTLF)的53%提升至62.18%。在设备端场景(小型非预训练SLU)中,本方法将综合准确率从基线方法(全局-局部对比学习框架,GL-CLeF)的5.31%提升至22.06%。与FC-MTLF和GL-CLeF不同,基于LLM的机器翻译方法无需改变SLU的生产架构。此外,本流水线具有槽位类型无关性,即无需定义槽位或提供示例。