Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

Medical agent systems are increasingly expected to support interactive clinical decision making rather than only static question answering. In such settings, effective agents must reuse prior experience across evolving cases, yet existing memory mechanisms often retain raw historical traces that are redundant, noisy, and difficult to govern. More importantly, they rarely distinguish which memories are truly useful for future reasoning. This limits their ability to accumulate compact and reliable experience for long-horizon clinical reasoning. To close this gap, we propose SkeMex, a post-deployment self-evolution framework that improves medical agents through a skill-based memory without updating model weights. SkeMex distills informative interaction trajectories into structured skills that encode reusable procedural knowledge, and organizes them into a multi-branch repository spanning general, task-specific, and action-level experience. To determine which memories should be reused and retained, SkeMex estimates context-dependent utility from environment feedback and uses it to guide value-aware retrieval and repository governance. A closed-loop ``Read--Write--Assess--Govern" lifecycle further supports continual evolution by writing new skills, updating utilities, promoting useful memories, and removing harmful entries. Experiments across diverse clinical tasks show that SkeMex consistently outperforms representative memory-based agents in both offline and online settings. It also generalizes across model backbones and supports transferable skill memory. All data and code will be released publicly.

翻译：医疗智能体系统正日益被期望支持交互式临床决策，而不仅仅是静态问答。在此类场景中，有效智能体必须跨演变案例复用先前经验，然而现有记忆机制通常保留原始历史轨迹，这些轨迹冗余、含噪且难以管控。更关键的是，它们很少区分哪些记忆对未来推理真正有用，这限制了智能体积累紧凑可靠经验以进行长期临床推理的能力。为弥补这一不足，我们提出SkeMex，一种部署后自演化框架，通过基于技能的记忆提升医疗智能体，而无需更新模型权重。SkeMex将信息丰富的交互轨迹蒸馏为编码可复用程序性知识的结构化技能，并将其组织为涵盖通用、任务特定和动作级经验的多分支存储库。为确定哪些记忆应被复用和保留，SkeMex从环境反馈中估计上下文相关的效用，并据此指导价值感知的检索与存储库治理。其闭环的“读取-写入-评估-治理”生命周期通过写入新技能、更新效用、提升有效记忆及移除有害条目，进一步支持持续演化。跨不同临床任务的实验表明，SkeMex在离线与在线场景中均一致优于代表性基于记忆的智能体。它还能泛化至不同模型骨干，并支持可迁移的技能记忆。所有数据和代码将公开发布。