Embodied agents must close a perception-to-action loop on embedded hardware under tight latency, memory, and energy budgets, making deployment a system-level co-design problem rather than a model-accuracy problem. We study this challenge for modular Object Goal Navigation (ObjectNav), where our profiling shows semantic mapping dominates per-step latency while goal prediction dominates peak memory. We formulate edge embodied navigation deployment as a budget-constrained design-space problem and introduce two orthogonal optimization knobs: SKIP, an adaptive sensorimotor scheduler that formalizes safe skipping as a bounded map-impact criterion and learns a lightweight predictor to estimate it from cheap sensor cues at each \texttt{FORWARD} step, exposing a principled quality-efficiency knob (depth-based updates are always retained); and SCOUT, a sparse-context encoder that couples submanifold sparse convolutions on active map regions with a lightweight dense context stream. On HM3D across server and embedded platforms, SKIP+SCOUT delivers up to 1.7x end-to-end speedup, 50.5% lower peak memory, and 7.1% higher SPL than the dense baseline at the selected operating point, outperforming naively smaller perception backbones. SKIP transfers to a second modular pipeline (PONI) with near-lossless performance and remains robust under depth-sensor noise. Together, SKIP+SCOUT expose a family of device-aware Pareto operating points for edge physical AI systems.
翻译:具身智能体必须在嵌入式硬件上在严格的延迟、内存和能量预算下完成从感知到行动的闭环,这使得部署成为一个系统级协同设计问题,而非模型精度问题。我们针对模块化目标导航(ObjectNav)研究了这一挑战,其中性能分析显示语义映射主导每步延迟,而目标预测主导峰值内存。我们将边缘具身导航部署形式化为一个预算约束的设计空间问题,并引入两个正交优化旋钮:SKIP——一种自适应感知运动调度器,将安全跳步形式化为有界地图影响准则,并学习一个轻量级预测器,在每个FORWARD步骤中利用廉价传感器线索估计该准则,从而暴露一个原理性的质量-效率旋钮(基于深度的更新始终保留);SCOUT——一种稀疏上下文编码器,将活动地图区域上的子流形稀疏卷积与轻量级稠密上下文流耦合。在HM3D数据集上,跨服务器和嵌入式平台,在选定操作点,SKIP+SCOUT相比稠密基线实现了高达1.7倍的端到端加速、50.5%的峰值内存降低和7.1%的SPL提升,优于朴素的小型感知主干网络。SKIP可迁移至第二个模块化流水线(PONI),性能几乎无损,且在深度传感器噪声下保持鲁棒。SKIP+SCOUT共同为边缘物理AI系统揭示了一系列设备感知的帕累托操作点族。