Flowchart-oriented dialogue (FOD) systems aim to guide users through multi-turn decision-making or operational procedures by following a domain-specific flowchart to achieve a task goal. In this work, we formalize flowchart reasoning in FOD as grounding user input to flowchart nodes at each dialogue turn while ensuring node transition is consistent with the correct flowchart path. Despite recent advances of LLMs in task-oriented dialogue systems, adapting them to FOD still faces two limitations: (1) LLMs lack an explicit mechanism to represent and reason over flowchart topology, and (2) they are prone to hallucinations, leading to unfaithful flowchart reasoning. To address these limitations, we propose FloCA, a zero-shot flowchart-oriented conversational agent. FloCA uses an LLM for intent understanding and response generation while delegating flowchart reasoning to an external tool that performs topology-constrained graph execution, ensuring faithful and logically consistent node transitions across dialogue turns. We further introduce an evaluation framework with an LLM-based user simulator and five new metrics covering reasoning accuracy and interaction efficiency. Extensive experiments on FLODIAL and PFDial datasets highlight the bottlenecks of existing LLM-based methods and demonstrate the superiority of FloCA. Our codes are available at https://github.com/Jinzi-Zou/FloCA-flowchart-reasoning.
翻译:流程图导向对话系统旨在通过遵循特定领域的流程图来引导用户完成多轮决策或操作流程,以实现任务目标。在本研究中,我们将流程图导向对话中的流程图推理形式化为:在每一轮对话中将用户输入定位到流程图节点,同时确保节点转移与正确的流程图路径保持一致。尽管大型语言模型在任务导向对话系统中取得了最新进展,但将其应用于流程图导向对话仍面临两个局限:(1)大型语言模型缺乏显式机制来表示和推理流程图拓扑结构;(2)它们容易产生幻觉,导致流程图推理不忠实。为解决这些局限,我们提出了FloCA——一种零样本流程图导向对话智能体。FloCA使用大型语言模型进行意图理解和回复生成,同时将流程图推理委托给执行拓扑约束图计算的外部工具,从而确保跨对话轮次的节点转移具有忠实性和逻辑一致性。我们进一步提出了一个包含基于大型语言模型的用户模拟器和五项新指标的评估框架,涵盖推理准确性和交互效率。在FLODIAL和PFDial数据集上的大量实验揭示了现有基于大型语言模型方法的瓶颈,并证明了FloCA的优越性。代码已开源:https://github.com/Jinzi-Zou/FloCA-flowchart-reasoning。