Flowchart-grounded troubleshooting dialogue (FTD) systems, which follow the instructions of a flowchart to diagnose users' problems in specific domains (eg., vehicle, laptop), have been gaining research interest in recent years. However, collecting sufficient dialogues that are naturally grounded on flowcharts is costly, thus FTD systems are impeded by scarce training data. To mitigate the data sparsity issue, we propose a plan-based data augmentation (PlanDA) approach that generates diverse synthetic dialog data at scale by transforming concise flowchart into dialogues. Specifically, its generative model employs a variational-base framework with a hierarchical planning strategy that includes global and local latent planning variables. Experiments on the FloDial dataset show that synthetic dialogue produced by PlanDA improves the performance of downstream tasks, including flowchart path retrieval and response generation, in particular on the Out-of-Flowchart settings. In addition, further analysis demonstrate the quality of synthetic data generated by PlanDA in paths that are covered by current sample dialogues and paths that are not covered.
翻译:流程图式故障排除对话(FTD)系统通过遵循流程图指令诊断特定领域(如车辆、笔记本电脑)的用户问题,近年来逐渐引起研究关注。然而,收集足够且自然基于流程图的对话成本高昂,因此FTD系统受限于训练数据的稀缺性。为缓解数据稀疏性问题,我们提出一种基于计划的数据增强方法(PlanDA),通过将简洁的流程图转化为对话,大规模生成多样化的合成对话数据。具体而言,其生成模型采用基于变分框架的架构,并引入包含全局和局部潜在规划变量的分层规划策略。在FloDial数据集上的实验表明,PlanDA生成的合成对话能提升下游任务(包括流程图路径检索和响应生成)的性能,尤其在"流程图外"设置下效果显著。此外,进一步分析验证了PlanDA生成的合成数据在现有样本对话覆盖路径与未覆盖路径上的质量。