The rise of Large Language Models (LLMs) has enabled agentic AI capable of complex reasoning and tool use; however, deploying such autonomy in pervasive computing environments remains challenging due to the strict memory and energy constraints of embedded microcontrollers. Existing frameworks typically assume server-class resources or continuous connectivity, leaving a gap for deeply embedded systems. This paper proposes a modular reference architecture for Embedded Agent Systems that bridges the divide between deterministic real-time control and agentic intelligence. We introduce a tiered design that decouples On-Device Agents - executing highly compressed neural networks and rule-based logic for low-latency, privacy-critical tasks - from Cloud-Augmented Agents that leverage Small Language Models (SLMs) for higher-level reasoning and planning. A key contribution is the integration of a cross-cutting Governance Layer, ensuring observability, policy enforcement, and safety across distributed fleets of autonomous devices. Rather than presenting purely empirical benchmarks, we analyze architectural design principles and trade-offs regarding latency, energy, and reliable execution in resource-constrained environments.
翻译:大语言模型(LLMs)的兴起使得具备复杂推理与工具使用能力的代理型人工智能成为可能;然而,在普适计算环境中部署此类自主系统仍面临严峻挑战,其根本原因在于嵌入式微控制器严格的存储与能量约束。现有框架通常假设具备服务器级计算资源或持续网络连接,难以适用于深度嵌入式系统。本文提出面向嵌入式代理系统的模块化参考架构,旨在弥合确定性实时控制与代理型智能之间的鸿沟。我们引入分层设计,将设备端代理(通过运行高度压缩神经网络与规则逻辑实现低延迟、隐私关键型任务)与云端增强代理(利用小语言模型(SLMs)进行高阶推理与规划)解耦。核心贡献在于集成跨层治理层,确保分布式自主设备集群的可观测性、策略执行与安全性。本文并非呈现纯粹经验性基准测试,而是分析资源受限环境下延迟、能耗与可靠执行相关的架构设计原则与权衡因素。