Dialogue systems and large language models (LLMs) have gained considerable attention. However, the direct utilization of LLMs as task-oriented dialogue (TOD) models has been found to underperform compared to smaller task-specific models. Nonetheless, it is crucial to acknowledge the significant potential of LLMs and explore improved approaches for leveraging their impressive abilities. Motivated by the goal of leveraging LLMs, we propose an alternative approach called User-Guided Response Optimization (UGRO) to combine it with a smaller TOD model. This approach uses LLM as annotation-free user simulator to assess dialogue responses, combining them with smaller fine-tuned end-to-end TOD models. By utilizing the satisfaction feedback generated by LLMs, UGRO further optimizes the supervised fine-tuned TOD model. Specifically, the TOD model takes the dialogue history as input and, with the assistance of the user simulator's feedback, generates high-satisfaction responses that meet the user's requirements. Through empirical experiments on two TOD benchmarks, we validate the effectiveness of our method. The results demonstrate that our approach outperforms previous state-of-the-art (SOTA) results.
翻译:对话系统与大语言模型(LLMs)已受到广泛关注。然而,直接利用LLMs作为任务导向对话(TOD)模型的效果被发现不如较小的专用任务模型。尽管如此,我们必须认识到LLMs的巨大潜力,并探索更优的方法来利用其卓越能力。受此目标驱动,我们提出了一种名为“用户引导响应优化(UGRO)”的替代方法,将其与较小的TOD模型相结合。该方法利用LLM作为无需标注的用户模拟器来评估对话响应,并与较小的微调端到端TOD模型结合。通过利用LLMs生成的满意度反馈,UGRO进一步优化了监督微调后的TOD模型。具体而言,TOD模型以对话历史为输入,在用户模拟器反馈的辅助下,生成满足用户需求的高满意度响应。在两个TOD基准上的实证实验验证了我们方法的有效性。结果表明,我们的方法超越了先前的最优(SOTA)结果。