Conversational systems often rely on embedding models for intent classification and intent clustering tasks. The advent of Large Language Models (LLMs), which enable instructional embeddings allowing one to adjust semantics over the embedding space using prompts, are being viewed as a panacea for these downstream conversational tasks. However, traditional evaluation benchmarks rely solely on task metrics that don't particularly measure gaps related to semantic understanding. Thus, we propose an intent semantic toolkit that gives a more holistic view of intent embedding models by considering three tasks-- (1) intent classification, (2) intent clustering, and (3) a novel triplet task. The triplet task gauges the model's understanding of two semantic concepts paramount in real-world conversational systems-- negation and implicature. We observe that current embedding models fare poorly in semantic understanding of these concepts. To address this, we propose a pre-training approach to improve the embedding model by leveraging augmentation with data generated by an auto-regressive model and a contrastive loss term. Our approach improves the semantic understanding of the intent embedding model on the aforementioned linguistic dimensions while slightly effecting their performance on downstream task metrics.
翻译:对话系统通常依赖嵌入模型进行意图分类和意图聚类任务。大型语言模型(LLM)的出现催生了可通过提示在嵌入空间调整语义的指令型嵌入,这类模型正被视为这些下游对话任务的万能解决方案。然而,传统评估基准仅依赖任务指标,并未专门衡量语义理解相关的缺陷。为此,我们提出一个意图语义工具包,通过三类任务更全面地评估意图嵌入模型:(1)意图分类、(2)意图聚类及(3)新颖的三元组任务。该三元组任务测试模型对现实对话系统中两个关键语义概念——否定与隐含——的理解能力。我们发现当前嵌入模型对这些概念的语义理解表现欠佳。为解决此问题,我们提出一种预训练方法,通过利用自回归模型生成的数据进行增强并结合对比损失项来改进嵌入模型。我们的方法在提升意图嵌入模型对上述语言维度的语义理解的同时,对下游任务指标性能的影响较小。