As Large Language Models (LLMs) are increasingly deployed as autonomous agents, they face a critical scalability bottleneck known as the "Generalization-Specialization Dilemma." Monolithic agents equipped with extensive toolkits suffer from context pollution and attention decay, leading to hallucinations. Conversely, static multi-agent swarms introduce significant latency and resource overhead. This paper introduces a Self-Evolving Concierge System, a novel architecture utilizing a Dynamic Mixture of Experts (DMoE) approach. Unlike recent self-improving agents that rewrite their own codebase, our system preserves stability by dynamically restructuring its runtime environment: "hiring" specialized sub-agents based on real-time conversation analysis. We introduce an asynchronous "Meta-Cognition Engine" that detects capability gaps, a Least Recently Used (LRU) eviction policy for resource constraints, and a novel "Surgical History Pruning" mechanism to mitigate refusal bias. Experimental results demonstrate that this architecture maintains high task success rates while minimizing token consumption compared to static agent swarms.
翻译:随着大型语言模型(LLM)越来越多地被部署为自主智能体,它们面临着一个关键的可扩展性瓶颈,即“泛化-专业化困境”。配备大量工具集的单体智能体受到上下文污染和注意力衰减的影响,导致产生幻觉。相反,静态的多智能体群则引入了显著的延迟和资源开销。本文介绍了一种自演进服务系统,这是一种利用动态专家混合模型(DMoE)方法的新型架构。与最近那些重写自身代码库以实现自我改进的智能体不同,我们的系统通过动态重组其运行时环境来保持稳定性:基于实时对话分析“招募”专门的子智能体。我们引入了一个异步的“元认知引擎”来检测能力差距,一个用于资源约束的最近最少使用(LRU)淘汰策略,以及一种新颖的“手术式历史剪枝”机制来缓解拒绝偏差。实验结果表明,与静态智能体群相比,该架构在保持高任务成功率的同时,最大限度地减少了令牌消耗。