Offloading computing to edge servers is a promising solution to support growing video understanding applications at resource-constrained IoT devices. Recent efforts have been made to enhance the scalability of such systems by reducing inference costs on edge servers. However, existing research is not directly applicable to pixel-level vision tasks such as video semantic segmentation (VSS), partly due to the fluctuating VSS accuracy and segment bitrate caused by the dynamic video content. In response, we present Penance, a new edge inference cost reduction framework. By exploiting softmax outputs of VSS models and the prediction mechanism of H.264/AVC codecs, Penance optimizes model selection and compression settings to minimize the inference cost while meeting the required accuracy within the available bandwidth constraints. We implement Penance in a commercial IoT device with only CPUs. Experimental results show that Penance consumes a negligible 6.8% more computation resources than the optimal strategy while satisfying accuracy and bandwidth constraints with a low failure rate.
翻译:将计算任务卸载到边缘服务器是支持资源受限物联网设备上日益增长的视频理解应用的一种有前景的解决方案。近期研究致力于通过降低边缘服务器的推理成本来增强此类系统的可扩展性。然而,现有研究无法直接适用于像素级视觉任务(如视频语义分割,VSS),部分原因在于动态视频内容导致的VSS精度和片段比特率波动。为此,我们提出Penance——一种新型边缘推理成本降低框架。通过利用VSS模型的softmax输出和H.264/AVC编解码器的预测机制,Penance优化模型选择与压缩设置,在满足所需精度的同时,将推理成本降至可用带宽约束下的最小值。我们在仅配备CPU的商业物联网设备上实现了Penance。实验结果表明,Penance相比最优方案仅消耗6.8%的额外计算资源,同时能以低失败率满足精度和带宽约束。