Embedded devices from wildlife monitoring stations to clinical wearables require local AI inference due to latency, communication, or privacy constraints. Optimizing models for heterogeneous microcontrollers (MCUs) requires simultaneously satisfying hard physical constraints on memory, power, and temperature while preserving accuracy, a multidimensional optimization that is today performed manually by experts. We ask whether an LLM agent can autonomously navigate this complex, multi-turn pipeline guided by real hardware feedback, and introduce a hardware-in-the-loop agent arena in which the agent iteratively refines both model and firmware -- compiling, flashing, and measuring on real hardware -- to enable closed-loop optimization. Frontier models, including Claude Opus 4.7 and Gemini 3.1 Pro, fail entirely without hardware feedback (0% deployment success), whereas our hardware-in-the-loop formulation achieves the first successful deployment within three iterations and can surpass human expert results within seven. This agentic co-optimization achieves 250x compression for vision models with <3.3% accuracy loss and 400x for audio with <6% Feature Error Rate loss, enabling battery-free operation on a commercial MCU via solar harvesting. We demonstrate practical impact in two real-world systems: an elk-detection camera trap (96.7% accuracy) and a phonetic-transcription wearable (8.44% FER) for child development research.
翻译:从野生动物监测站到临床可穿戴设备,嵌入式设备因延迟、通信或隐私限制而需要本地AI推理。针对异构微控制器优化模型,需同时满足内存、功耗和温度等硬性物理约束并保持精度——这一多维优化目前由专家手动完成。我们探究LLM智能体能否在真实硬件反馈引导下自主导航这一复杂多轮管线,并引入一种硬件在环智能体竞技场:智能体通过迭代优化模型与固件(编译、烧录并在真实硬件上测量)实现闭环优化。前沿模型(包括Claude Opus 4.7和Gemini 3.1 Pro)在无硬件反馈时完全失败(部署成功率为0%),而我们的硬件在环方案在三轮迭代内即实现首次成功部署,并在七轮内超越人类专家结果。该智能体协同优化实现了视觉模型250倍压缩(精度损失<3.3%)与音频模型400倍压缩(特征错误率损失<6%),使商业微控制器通过太阳能采集实现无电池运行。我们在两个真实系统中展示了实际应用价值:麋鹿检测相机陷阱(96.7%准确率)和儿童发展研究的语音转录可穿戴设备(8.44%字错率)。