Open intent detection, a crucial aspect of natural language understanding, involves the identification of previously unseen intents in user-generated text. Despite the progress made in this field, challenges persist in handling new combinations of language components, which is essential for compositional generalization. In this paper, we present a case study exploring the use of ChatGPT as a data augmentation technique to enhance compositional generalization in open intent detection tasks. We begin by discussing the limitations of existing benchmarks in evaluating this problem, highlighting the need for constructing datasets for addressing compositional generalization in open intent detection tasks. By incorporating synthetic data generated by ChatGPT into the training process, we demonstrate that our approach can effectively improve model performance. Rigorous evaluation of multiple benchmarks reveals that our method outperforms existing techniques and significantly enhances open intent detection capabilities. Our findings underscore the potential of large language models like ChatGPT for data augmentation in natural language understanding tasks.
翻译:开放意图检测是自然语言理解的关键组成部分,涉及识别用户生成文本中先前未见过的意图。尽管该领域已取得进展,但在处理语言成分的新组合方面仍存在挑战,而这对组合泛化至关重要。本文通过案例研究,探索利用ChatGPT作为数据增强技术来增强开放意图检测任务中的组合泛化能力。我们首先讨论了现有基准在评估该问题时的局限性,强调需要构建专门针对开放意图检测任务中组合泛化问题的数据集。通过将ChatGPT生成的合成数据融入训练过程,我们证明该方法能有效提升模型性能。对多个基准的严格评估表明,我们的方法优于现有技术,并显著增强了开放意图检测能力。研究结果凸显了大语言模型(如ChatGPT)在自然语言理解任务中进行数据增强的潜力。