Most existing language model agentic systems today are built and optimized for large language models (e.g., GPT, Claude, Gemini) via API calls; while powerful, this approach faces several limitations including high token costs and privacy concerns for sensitive applications. We introduce EffGen, an open-source agentic framework optimized for small language models (SLMs) that enables effective, efficient, and secure local deployment. EffGen makes four major contributions: (1) Enhanced tool-calling with prompt optimization that compresses input prompts by up to 70-80% (and 57% on average across our benchmarks) while preserving task semantics, (2) Intelligent task decomposition that breaks complex queries into parallel or sequential subtasks based on dependencies, (3) Complexity-based routing using five factors to make smart pre-execution decisions, and (4) Unified memory system combining short-term, long-term, and vector-based storage. Additionally, EffGen unifies multiple agent protocols (MCP, A2A, ACP) for cross-protocol communication. Results on 13 benchmarks show EffGen outperforms LangChain, AutoGen, and Smolagents with higher success rates, faster execution, and lower memory. Our results reveal that prompt optimization and complexity routing have complementary scaling behavior: optimization benefits SLMs more (11.2% gain at 1.5B vs 2.4% at 32B), while routing benefits large models more (3.6% at 1.5B vs 7.9% at 32B), providing consistent gains across all scales when combined. EffGen is released under the Apache 2.0 License, ensuring broad accessibility for research and commercial use, with the code available at https://github.com/ctrl-gaurav/effGen, the Python package at https://pypi.org/project/effgen/ (pip install effgen), and the project website and documentation at https://effgen.org/ and https://docs.effgen.org/.
翻译:现有的大多数语言模型智能体系统均通过API调用针对大型语言模型(如GPT、Claude、Gemini)构建与优化;尽管功能强大,但这种方法面临若干局限,包括高额的标识符成本以及对敏感应用的隐私担忧。我们提出EffGen,一个面向小型语言模型(SLM)优化的开源智能体框架,支持高效、安全的本地部署。EffGen做出四项主要贡献:(1)增强型工具调用,通过提示优化将输入提示压缩高达70-80%(在基准测试中平均压缩57%),同时保留任务语义;(2)智能任务分解,根据依赖关系将复杂查询拆分为并行或顺序子任务;(3)基于复杂度的路由,利用五个因素做出智能的预执行决策;(4)统一内存系统,融合短期、长期和基于向量的存储。此外,EffGen统一了多种智能体协议(MCP、A2A、ACP)以实现跨协议通信。在13个基准测试上的结果表明,EffGen在成功率、执行速度和内存占用方面均优于LangChain、AutoGen和Smolagents。我们的研究结果揭示,提示优化与复杂度路由具有互补的扩展行为:优化对SLM的增益更大(1.5B模型增益11.2%,32B模型增益2.4%),而路由对大型模型的增益更大(1.5B模型增益3.6%,32B模型增益7.9%),两者结合可在所有规模下提供一致的性能提升。EffGen采用Apache 2.0许可证发布,确保研究及商业用途的广泛可及性,代码见https://github.com/ctrl-gaurav/effGen,Python包见https://pypi.org/project/effgen/(pip install effgen),项目网站及文档见https://effgen.org/和https://docs.effgen.org/。