Traditional task-oriented dialog systems are unable to evolve from ongoing interactions or adapt to new domains after deployment, that is a critical limitation in real-world dynamic environments. Continual learning approaches depend on episodic retraining with human curated data, failing to achieve autonomy lifelong improvement. While evolutionary computation and LLM driven self improvement offer promising mechanisms for dialog optimization, they lack a unified framework for holistic, iterative strategy refinement. To bridge this gap, we propose DarwinTOD, a lifelong self evolving dialog framework that systematically integrates these two paradigms, enabling continuous strategy optimization from a zero-shot base without task specific fine-tuning. DarwinTOD maintains an Evolvable Strategy Bank and operates through a dual-loop process: online multi-agent dialog execution with peer critique, and offline structured evolutionary operations that refine the strategy bank using accumulated feedback. This closed-loop design enables autonomous continuous improvement without human intervention. Extensive experiments show that DarwinTOD surpasses previous state-of-the-art methods and exhibits continuous performance gains throughout evolution. Our work provides a novel framework for building dialog systems with lifelong self evolution capabilities.
翻译:传统的任务导向对话系统在部署后无法从持续交互中演进或适应新领域,这在现实动态环境中是一个关键限制。持续学习方法依赖于人工标注数据的周期性重训练,难以实现自主终身改进。尽管进化计算和大语言模型驱动的自我优化为对话优化提供了有前景的机制,但它们缺乏整体迭代策略优化的统一框架。为弥补这一空白,我们提出达尔文任务导向对话框架,这是一个终身自演进对话框架,系统整合了这两种范式,使得无需任务特定微调即可从零样本基础实现持续策略优化。该框架维护一个可进化策略库,并通过双循环流程运行:在线多智能体对话执行与同行评审,以及离线结构化进化操作——利用累积反馈优化策略库。这种闭环设计实现了无需人工干预的自主持续改进。大量实验表明,该框架超越了先前最先进方法,并在整个演进过程中展现出持续的性能提升。我们的工作为构建具有终身自演进能力的对话系统提供了创新框架。