While LLM-based agents have shown promise for deep research, most existing approaches rely on fixed workflows that struggle to adapt to real-world, open-ended queries. Recent work therefore explores self-evolution by allowing agents to rewrite their own code or prompts to improve problem-solving ability, but unconstrained optimization often triggers instability, hallucinations, and instruction drift. We propose EvoFSM, a structured self-evolving framework that achieves both adaptability and control by evolving an explicit Finite State Machine (FSM) instead of relying on free-form rewriting. EvoFSM decouples the optimization space into macroscopic Flow (state-transition logic) and microscopic Skill (state-specific behaviors), enabling targeted improvements under clear behavioral boundaries. Guided by a critic mechanism, EvoFSM refines the FSM through a small set of constrained operations, and further incorporates a self-evolving memory that distills successful trajectories as reusable priors and failure patterns as constraints for future queries. Extensive evaluations on five multi-hop QA benchmarks demonstrate the effectiveness of EvoFSM. In particular, EvoFSM reaches 58.0% accuracy on the DeepSearch benchmark. Additional results on interactive decision-making tasks further validate its generalization.
翻译:尽管基于大语言模型(LLM)的智能体在深度研究任务中展现出潜力,但现有方法大多依赖固定工作流程,难以适应现实世界中开放式的查询需求。为此,近期研究探索通过让智能体重写自身代码或提示词来实现自我演化,以提升问题解决能力,然而无约束的优化往往引发不稳定性、幻觉及指令漂移等问题。本文提出EvoFSM,一种结构化的自演化框架,通过演化显式的有限状态机(FSM)而非依赖自由形式的重写,实现了适应性与可控性的统一。EvoFSM将优化空间解耦为宏观的流程(状态转移逻辑)与微观的技能(状态特定行为),从而在清晰的行为边界下实现有针对性的改进。在评估机制的引导下,EvoFSM通过一组受限的操作对FSM进行细化,并进一步引入自演化记忆模块,将成功轨迹提炼为可复用的先验知识,同时将失败模式转化为未来查询的约束条件。在五个多跳问答基准上的广泛实验验证了EvoFSM的有效性。特别地,EvoFSM在DeepSearch基准上达到了58.0%的准确率。在交互式决策任务上的额外实验结果进一步证实了其良好的泛化能力。