The growth of agent skills has transformed how agentic systems are built, evaluated, and deployed. As skill libraries continue to scale, rigorous evaluation becomes critical to ensuring their utility, quality, and safety in real-world applications. Consequently, the field is undergoing an emerging paradigm shift from isolated skill creation to automated, evaluation-driven skill evolution. In this survey, we systematically examine the landscape of skill evolution and evaluation beyond foundational skill creation. We categorize evolution into four distinct paradigms, spanning execution feedback, trajectory distillation, compression, and reinforcement learning, showing how each element contributes to improving skill utility and reliability. We also provide an analysis of six skill-centric benchmark categories, identifying structural gaps in benchmark coverage, trade-offs, and metric richness to advance skill research. Finally, we identify open directions for building skill ecosystems that are generalizable, efficient, and verifiably safe. The project URL is https://github.com/Cassie07/AgentSkill_Survey
翻译:智能体技能的成长已深刻改变了智能体系统的构建、评估与部署方式。随着技能库规模持续扩展,严格的评估成为确保其在实际应用中的效用、质量及安全性的关键。因此,该领域正经历从孤立技能创建到自动化、评估驱动的技能进化的新兴范式转变。本综述系统性地审视了超越基础技能创建的技能进化与评估全景。我们将进化划分为四种不同的范式:执行反馈、轨迹蒸馏、压缩与强化学习,揭示了每种范式如何提升技能效用与可靠性。同时,我们分析了六类技能中心基准,识别出基准覆盖范围、权衡及指标丰富性方面的结构性缺口,以推动技能研究进展。最后,我们指出了构建可泛化、高效且可验证安全的技能生态系统的开放性方向。项目网址为 https://github.com/Cassie07/AgentSkill_Survey