Modern smart AR glasses are evolving into intelligent systems that support foundation model-based assistance through continuous perception of the user and surrounding environment. However, this perception-first design creates major bottlenecks. Continuously capturing, processing, and storing rich perceptual streams, especially high-resolution egocentric video, imposes substantial power and memory overhead, which is difficult to sustain on resource-constrained AR glasses. In this work, we propose EPIC, an efficient egocentric perception system for embodied intelligence on smart AR glasses. EPIC is an algorithm-hardware co-optimization framework that leverages gaze, pose, and inertial signals to infer user intent and retain only the most informative parts of high-resolution perceptual input, greatly reducing perception overhead. Our results show that EPIC reduces memory footprint by $27.5\times$ and energy consumption by $24.3\times$ on average compared with full video baseline solution, while preserving intelligent assistance accuracy on egocentric video understanding tasks, a key application scenario for embodied intelligence on smart glasses.
翻译:现代智能AR眼镜正逐步演变为智能系统,通过持续感知用户及周围环境来支持基于基础模型的辅助功能。然而,这种以感知为先的设计带来了重大瓶颈。持续捕获、处理并存储丰富的感知数据流(特别是高分辨率自我中心视频)会产生巨大的功耗与内存开销,这在资源受限的AR眼镜上难以维持。本文提出EPIC——一种面向智能AR眼镜具身智能的高效自我中心感知系统。EPIC是一种算法-硬件协同优化框架,利用注视、姿态与惯性信号来推断用户意图,仅保留高分辨率感知输入中最具信息量的部分,从而大幅降低感知开销。实验结果表明,与全视频基线方案相比,EPIC平均将内存占用降低27.5倍,能耗降低24.3倍,同时保持自我中心视频理解任务(智能眼镜具身智能的关键应用场景)中的智能辅助准确性。