Variational Latent-State GPT for Semi-Supervised Task-Oriented Dialog Systems

Recently, two approaches, fine-tuning large pre-trained language models and variational training, have attracted significant interests, separately, for semi-supervised end-to-end task-oriented dialog (TOD) systems. In this paper, we propose Variational Latent-State GPT model (VLS-GPT), which is the first to combine the strengths of the two approaches. Among many options of models, we propose the generative model and the inference model for variational learning of the end-to-end TOD system, both as auto-regressive language models based on GPT-2, which can be further trained over a mix of labeled and unlabeled dialog data in a semi-supervised manner. Variational training of VLS-GPT is both statistically and computationally more challenging than previous variational learning works for sequential latent variable models, which use turn-level first-order Markovian. The inference model in VLS-GPT is non-Markovian due to the use of the Transformer architecture. In this work, we establish Recursive Monte Carlo Approximation (RMCA) to the variational objective with non-Markovian inference model and prove its unbiasedness. Further, we develop the computational strategy of sampling-then-forward-computation to realize RMCA, which successfully overcomes the memory explosion issue of using GPT in variational learning and speeds up training. Semi-supervised TOD experiments are conducted on two benchmark multi-domain datasets of different languages - MultiWOZ2.1 and CrossWOZ. VLS-GPT is shown to significantly outperform both supervised-only and semi-supervised self-training baselines.

翻译：近期，微调大型预训练语言模型与变分训练这两种方法分别在半监督端到端任务导向对话系统（TOD）中引起了广泛关注。本文提出变分隐状态GPT模型（VLS-GPT），首次将这两种方法的优势相结合。在众多模型选项中，我们为端到端TOD系统的变分学习设计了生成模型与推理模型，两者均基于GPT-2的自回归语言模型，可通过半监督方式在标记与未标记对话数据的混合数据集上进行进一步训练。与以往针对序列隐变量模型（采用轮级一阶马尔可夫假设）的变分学习工作相比，VLS-GPT的变分训练在统计与计算层面更具挑战性。由于采用Transformer架构，VLS-GPT中的推理模型是非马尔可夫的。本研究建立了面向非马尔可夫推理模型的变分目标递归蒙特卡洛近似（RMCA）方法，并证明了其无偏性。进一步，我们开发了先采样再前向计算的计算策略以实现RMCA，成功克服了在变分学习中使用GPT时的内存爆炸问题，并加速了训练过程。在两个跨语言基准多域数据集（MultiWOZ2.1和CrossWOZ）上进行的半监督TOD实验表明，VLS-GPT显著优于纯监督基线及半监督自训练基线。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日