Task oriented dialog systems often rely on static exploration strategies that do not adapt to dynamic dialog contexts, leading to inefficient exploration and suboptimal performance. We propose DyBBT, a novel dialog policy learning framework that formalizes the exploration challenge through a structured cognitive state space capturing dialog progression, user uncertainty, and slot dependency. DyBBT proposes a bandit inspired meta-controller that dynamically switches between a fast intuitive inference (System 1) and a slow deliberative reasoner (System 2) based on real-time cognitive states and visitation counts. Extensive experiments on single- and multi-domain benchmarks show that DyBBT achieves state-of-the-art performance in success rate, efficiency, and generalization, with human evaluations confirming its decisions are well aligned with expert judgment.
翻译:任务型对话系统常依赖静态探索策略,无法适应动态对话上下文,导致探索效率低下及性能欠佳。本文提出DyBBT,一种新型对话策略学习框架,通过构建结构化认知状态空间(捕捉对话进程、用户不确定性与槽位依赖关系),将探索挑战形式化。DyBBT提出基于赌徒启发的元控制器,依据实时认知状态与访问频次,在快速直觉推理系统(System 1)与慢速审慎推理系统(System 2)间动态切换。在单领域与多领域基准上的大量实验表明,DyBBT在成功率、效率与泛化能力上均达到最优性能,人工评估证实其决策与专家判断高度一致。