Lifelong agents should expand their competence over time without retraining from scratch or overwriting previously learned behaviors. We investigate this in a challenging real-time control setting (Dark Souls III) by representing combat as a directed skill graph and training its components in a hierarchical curriculum. The resulting agent decomposes control into five reusable skills: camera control, target lock-on, movement, dodging, and a heal-attack decision policy, each optimized for a narrow responsibility. This factorization improves sample efficiency by reducing the burden on any single policy and supports selective post-training: when the environment shifts from Phase 1 to Phase 2, only a subset of skills must be adapted, while upstream skills remain transferable. Empirically, we find that targeted fine-tuning of just two skills rapidly recovers performance under a limited interaction budget, suggesting that skill-graph curricula together with selective fine-tuning offer a practical pathway toward evolving, continually learning agents in complex real-time environments.
翻译:终身智能体应能在不从头训练或覆盖先前习得行为的前提下,随时间推移持续扩展其能力。本研究以高难度实时控制环境(《黑暗之魂III》)为实验场景,将战斗过程表征为有向技能图,并通过分层课程训练其各组成部分。所得智能体将控制任务分解为五项可复用技能:镜头控制、目标锁定、移动、闪避及治疗-攻击决策策略,每项技能均针对特定职责进行优化。这种因子化方法通过减轻单一策略的负担提升了样本效率,并支持选择性后训练:当环境从第一阶段切换至第二阶段时,仅需调整部分技能,而上游技能仍保持可迁移性。实证研究表明,在有限交互预算下,仅针对两项技能进行定向微调即可快速恢复性能,这表明技能图课程与选择性微调相结合,为复杂实时环境中持续进化的终身学习智能体提供了一条可行的技术路径。