Previous work on spoken language understanding (SLU) mainly focuses on single-intent settings, where each input utterance merely contains one user intent. This configuration significantly limits the surface form of user utterances and the capacity of output semantics. In this work, we first propose a Multi-Intent dataset which is collected from a realistic in-Vehicle dialogue System, called MIVS. The target semantic frame is organized in a 3-layer hierarchical structure to tackle the alignment and assignment problems in multi-intent cases. Accordingly, we devise a BiRGAT model to encode the hierarchy of ontology items, the backbone of which is a dual relational graph attention network. Coupled with the 3-way pointer-generator decoder, our method outperforms traditional sequence labeling and classification-based schemes by a large margin.
翻译:先前关于口语理解(SLU)的研究主要聚焦于单意图场景,其中每个输入话语仅包含单一用户意图。这种配置显著限制了用户话语的表层形式和输出语义的表达能力。本文首先提出一个从真实车载对话系统中收集的多意图数据集MIVS。目标语义框架采用3层分层结构组织,以解决多意图场景下的对齐与分配问题。据此,我们设计了BiRGAT模型来编码本体项目的层级结构,其核心为双关系图注意力网络。结合三路指针生成解码器,本方法在性能上大幅超越传统的序列标注与分类方案。