Recently, Large Language Models (LLMs) have emerged as an alternative to training task-specific dialog agents, due to their broad reasoning capabilities and performance in zero-shot learning scenarios. However, many LLM-based dialog systems fall short in planning towards an overarching dialog goal and therefore cannot steer the conversation appropriately. Furthermore, these models struggle with hallucination, making them unsuitable for information access in sensitive domains, such as legal or medical domains, where correctness of information given to users is critical. The recently introduced task Conversational Tree Search (CTS) proposes the use of dialog graphs to avoid hallucination in sensitive domains, however, state-of-the-art agents are Reinforcement Learning (RL) based and require long training times, despite excelling at dialog strategy. This paper introduces a novel zero-shot method for controllable CTS agents, where LLMs guide the dialog planning through domain graphs by searching and pruning relevant graph nodes based on user interaction preferences. We show that these agents significantly outperform state-of-the-art CTS agents ($p<0.0001$; Barnard Exact test) in simulation. This generalizes to all available CTS domains. Finally, we perform user evaluation to test the agent's performance in the wild, showing that our policy significantly ($p<0.05$; Barnard Exact) improves task-success compared to the state-of-the-art RL-based CTS agent.
翻译:近年来,大语言模型(LLMs)因其广泛的推理能力和在零样本学习场景中的表现,已成为训练任务特定对话代理的替代方案。然而,许多基于LLM的对话系统在规划整体对话目标方面存在不足,因此无法恰当地引导对话。此外,这些模型存在幻觉问题,使其不适用于敏感领域(如法律或医疗领域)的信息获取,因为这些领域向用户提供信息的准确性至关重要。最近提出的对话树搜索(CTS)任务建议使用对话图来避免敏感领域中的幻觉,然而,最先进的代理基于强化学习(RL),尽管在对话策略方面表现出色,但需要较长的训练时间。本文提出了一种新颖的零样本可控CTS代理方法,其中LLMs通过基于用户交互偏好搜索和修剪相关图节点,引导对话规划在领域图中进行。我们证明,在模拟中,这些代理显著优于最先进的CTS代理($p<0.0001$;Barnard精确检验)。这一结果推广到所有可用的CTS领域。最后,我们进行了用户评估以测试代理在真实环境中的性能,结果表明我们的策略相比最先进的基于RL的CTS代理显著提高了任务成功率($p<0.05$;Barnard精确检验)。