Large Language Models (LLMs) are effective for data augmentation in classification tasks like intent detection. In some cases, they inadvertently produce examples that are ambiguous with regard to untargeted classes. We present DDAIR (Disambiguated Data Augmentation for Intent Recognition) to mitigate this problem. We use Sentence Transformers to detect ambiguous class-guided augmented examples generated by LLMs for intent recognition in low-resource scenarios. We identify synthetic examples that are semantically more similar to another intent than to their target one. We also provide an iterative re-generation method to mitigate such ambiguities. Our findings show that sentence embeddings effectively help to (re)generate less ambiguous examples, and suggest promising potential to improve classification performance in scenarios where intents are loosely or broadly defined.
翻译:大型语言模型(LLM)在意图检测等分类任务的数据增强方面表现优异。然而在某些情况下,这些模型会无意中生成与非目标类别存在歧义的样本。本文提出DDAIR(基于消歧的意图识别数据增强方法)以缓解该问题。在低资源场景的意图识别任务中,我们采用Sentence Transformers检测由LLM生成的、存在类别导向歧义的增强样本。我们识别出那些在语义上更接近其他意图而非其目标意图的合成样本。同时,我们提出一种迭代式再生方法以消除此类歧义。实验结果表明,句子嵌入技术能有效辅助(再)生成歧义性较低的样本,并在意图定义较为宽泛的场景中展现出提升分类性能的潜力。