Agent skills are widely supported by major agentic frameworks and perform well with proprietary models, yet their effectiveness for small and medium-sized open source language models (270 M-80B) remains underexplored. We systematically study the Skill paradigm in resource-constrained industrial settings, where reliance on proprietary APIs is impractical due to data security and budget constraints. Across two open-source tasks and a real-world insurance claims classification task, we find that very small models struggle with reliable skill selection, while models around 30B-80B benefit substantially. Thinking variants do not show major levels of improvement from skills, also considering GPU usage increases due to overthinking. These findings reveal a trade-off between GPU cost and agent performance, and provide actionable insights for effective Skill configuration and SLM deployment in real world settings.
翻译:暂无翻译