Traditional task-oriented dialog systems are unable to evolve from ongoing interactions or adapt to new domains after deployment, that is a critical limitation in real-world dynamic environments. Continual learning approaches depend on episodic retraining with human curated data, failing to achieve autonomy lifelong improvement. While evolutionary computation and LLM driven self improvement offer promising mechanisms for dialog optimization, they lack a unified framework for holistic, iterative strategy refinement. To bridge this gap, we propose DarwinTOD, a lifelong self evolving dialog framework that systematically integrates these two paradigms, enabling continuous strategy optimization from a zero-shot base without task specific fine-tuning. DarwinTOD maintains an Evolvable Strategy Bank and operates through a dual-loop process: online multi-agent dialog execution with peer critique, and offline structured evolutionary operations that refine the strategy bank using accumulated feedback. This closed-loop design enables autonomous continuous improvement without human intervention. Extensive experiments show that DarwinTOD surpasses previous state-of-the-art methods and exhibits continuous performance gains throughout evolution. Our work provides a novel framework for building dialog systems with lifelong self evolution capabilities.
翻译:摘要:传统任务型对话系统无法从持续交互中进化,也难以在部署后适应新领域——这一局限在真实动态环境中尤为致命。持续学习方法依赖人工标注数据的阶段性重训练,无法实现自主终身改进。尽管进化计算与LLM驱动的自我改进为对话优化提供了有前景的机制,但它们缺乏一个统一的整体性迭代策略优化框架。为填补这一空白,我们提出DarwinTOD——一个终身自进化对话框架,系统性地整合上述两种范式,从零样本基础出发实现连续策略优化,无需任务特定微调。DarwinTOD维护一个可进化策略库,通过双循环流程运作:包含同伴评议的在线多智能体对话执行,以及利用累积反馈优化策略库的离线结构化进化操作。这种闭环设计使其能够在无需人工干预的情况下实现自主持续改进。大量实验表明,DarwinTOD超越了此前最先进方法,并在整个进化过程中展现出持续的性能提升。我们的工作为构建具备终身自进化能力的对话系统提供了全新框架。