Large artificial intelligence models (LAIMs) are increasingly regarded as a core intelligence engine for embodied AI applications. However, the massive parameter scale and computational demands of LAIMs pose significant challenges for resource-limited embodied agents. To address this issue, we investigate quantization-aware collaborative inference (co-inference) for embodied AI systems. First, we develop a tractable approximation for quantization-induced inference distortion. Based on this approximation, we derive lower and upper bounds on the quantization rate-inference distortion function, characterizing its dependence on LAIM statistics, including the quantization bit-width. Next, we formulate a joint quantization bit-width and computation frequency design problem under delay and energy constraints, aiming to minimize the distortion upper bound while ensuring tightness through the corresponding lower bound. Extensive evaluations validate the proposed distortion approximation, the derived rate-distortion bounds, and the effectiveness of the proposed joint design. Particularly, simulations and real-world testbed experiments demonstrate the effectiveness of the proposed joint design in balancing inference quality, latency, and energy consumption in edge embodied AI systems.
翻译:大型人工智能模型(LAIMs)日益被视为具身人工智能应用的核心智能引擎。然而,LAIMs庞大的参数量与计算需求对资源受限的具身智能体构成了重大挑战。为解决此问题,本研究探讨了面向具身人工智能系统的量化感知协同推理(协同推理)方法。首先,我们提出了一个可处理的量化诱发推理失真近似模型。基于此近似,我们推导了量化率-推理失真函数的下界与上界,该函数刻画了其对LAIM统计特性(包括量化比特宽度)的依赖关系。随后,我们在延迟与能量约束下,构建了一个联合量化比特宽度与计算频率设计问题,旨在最小化失真上界的同时,通过对应的下界确保其紧致性。大量评估验证了所提出的失真近似模型、推导出的率失真界以及所提联合设计的有效性。特别地,仿真与真实世界测试平台实验表明,所提联合设计在边缘具身人工智能系统中能有效平衡推理质量、延迟与能耗。