ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Emergent Adaptation

LLM-powered agentic systems excel at complex long-horizon tasks, but remain constrained by static configurations fixed before execution. Such rigidity forces a trade-off between domain-specific performance and cross-task generalization: strong priors and compact tool spaces aid specialization but weaken transfer, while task-agnostic workflows and broad action spaces expand coverage but dilute guidance. Existing pre-execution optimization, planner-worker orchestration, and configuration patching fall short of resolving this tension, as they decouple adaptation from execution, causing information loss, fragmented optimization, and ambiguous credit assignment. We propose ToolSelf, a tool-driven runtime self-reconfiguration paradigm that abstracts configuration updates as a standardized tool interface and unifies execution and adaptation within one policy's action space. The execution agent can dynamically update sub-goals, strategies, toolboxes, context, and context-management modes based on task progress and feedback. We further introduce Configuration-Aware Two-stage Training (CAT), which combines rejection sampling fine-tuning with trajectory-level KTO reinforcement learning to internalize self-reconfiguration. Across diverse benchmarks, zero-shot ToolSelf rivals task-specialized agents; after CAT training, ToolSelf gains 28.8 points over the static-configuration baseline on average, illuminating a path toward emergent adaptivity that obviates manually injected guidance. The code is available at https://github.com/lian-tian-mo-zun/ToolSelf.

翻译：基于大语言模型的智能体系统在处理复杂长时程任务方面表现卓越，但始终受限于执行前预设的静态配置。这种刚性机制迫使领域特定性能与跨任务泛化能力之间形成权衡：强先验和紧凑工具空间有助于专业化但削弱迁移能力，而任务无关工作流与宽泛动作空间虽能扩展覆盖范围却稀释了引导信号。现有预执行优化、规划-执行者编排及配置修补方法难以解决这一矛盾，因其将适应与执行解耦，导致信息丢失、优化碎片化及信用分配模糊。我们提出ToolSelf——一种基于工具驱动的运行时自我重构范式，将配置更新抽象为标准化的工具接口，并将执行与适应统一至单一策略的动作空间内。执行代理可根据任务进展与反馈动态更新子目标、策略、工具集、上下文及上下文管理模式。我们进一步引入配置感知两阶段训练（CAT），结合拒绝采样微调与轨迹级KTO强化学习，将自我重构能力内化于模型。跨多个基准测试的零样本ToolSelf性能媲美任务专用代理；经CAT训练后，ToolSelf相较静态配置基线平均提升28.8分，揭示了无需人工植入引导信号即可实现涌现自适应的可行路径。代码已开源：https://github.com/lian-tian-mo-zun/ToolSelf。