Flowchart-grounded troubleshooting dialogue (FTD) systems, which follow the instructions of a flowchart to diagnose users' problems in specific domains (e.g., vehicle, laptop), have been gaining research interest in recent years. However, collecting sufficient dialogues that are naturally grounded on flowcharts is costly, thus FTD systems are impeded by scarce training data. To mitigate the data sparsity issue, we propose a plan-based synthetic data generation (PlanSDG) approach that generates diverse synthetic dialog data at scale by transforming concise flowchart into dialogues. Specifically, its generative model employs a variational-base framework with a hierarchical planning strategy that includes global and local latent planning variables. Experiments on the FloDial dataset show that synthetic dialogue produced by PlanSDG improves the performance of downstream tasks, including flowchart path retrieval and response generation, in particular on the Out-of-Flowchart settings. In addition, further analysis demonstrate the quality of synthetic data generated by PlanSDG in paths that are covered by current sample dialogues and paths that are not covered.
翻译:基于流程图的故障排除对话(FTD)系统通过遵循流程图指令来诊断特定领域(如车辆、笔记本电脑)中的用户问题,近年来引起了研究兴趣。然而,自然地基于流程图的对话数据收集成本高昂,导致FTD系统因训练数据稀缺而受到阻碍。为缓解数据稀疏问题,我们提出了一种基于计划的合成数据生成(PlanSDG)方法,该方法通过将简洁的流程图转化为对话,规模化生成多样化的合成对话数据。具体而言,其生成模型采用基于变分框架的层次化规划策略,包含全局和局部潜在规划变量。在FloDial数据集上的实验表明,PlanSDG生成的合成对话提升了下游任务(包括流程图路径检索和响应生成)的性能,尤其是在超出流程图设置的情况下。此外,进一步分析展示了PlanSDG生成的合成数据在现有样本对话覆盖路径和未覆盖路径上的质量。