Recent advanced methods in Natural Language Understanding for Task-oriented Dialogue (TOD) Systems (e.g., intent detection and slot filling) require a large amount of annotated data to achieve competitive performance. In reality, token-level annotations (slot labels) are time-consuming and difficult to acquire. In this work, we study the Slot Induction (SI) task whose objective is to induce slot boundaries without explicit knowledge of token-level slot annotations. We propose leveraging Unsupervised Pre-trained Language Model (PLM) Probing and Contrastive Learning mechanism to exploit (1) unsupervised semantic knowledge extracted from PLM, and (2) additional sentence-level intent label signals available from TOD. Our approach is shown to be effective in SI task and capable of bridging the gaps with token-level supervised models on two NLU benchmark datasets. When generalized to emerging intents, our SI objectives also provide enhanced slot label representations, leading to improved performance on the Slot Filling tasks.
翻译:任务导向型对话(TOD)系统中的自然语言理解(如意图检测和槽位填充)最新方法需要大量标注数据才能获得竞争性性能。然而在实际应用中,词元级标注(槽位标签)耗时且难以获取。本研究聚焦槽位归纳(SI)任务,其目标是在缺乏显式词元级槽位标注知识的情况下推导槽位边界。我们提出利用无监督预训练语言模型(PLM)探测与对比学习机制,以挖掘:(1)从PLM中提取的无监督语义知识;(2)TOD系统中可获取的额外句子级意图标签信号。实验表明,本方法在SI任务中表现有效,能在两个NLU基准数据集上弥合与词元级监督模型之间的差距。当泛化至新兴意图时,我们的SI目标还能提供增强的槽位标签表示,从而提升槽位填充任务的性能。