Video analytics demand substantial computing resources, posing significant challenges in computing resource-constrained environment. In this paper, to achieve high accuracy with acceptable computational workload, we propose a cost-effective regions of interest (RoIs) extraction and adaptive inference scheme based on the informative encoding metadata. Specifically, to achieve efficient RoI-based analytics, we explore motion vectors from encoding metadata to identify RoIs in non-reference frames through morphological opening operation. Furthermore, considering the content variation of RoIs, which calls for inference by models with distinct size, we measure RoI complexity based on the bitrate allocation information from encoding metadata. Finally, we design an algorithm that prioritizes scheduling RoIs to models of the appropriate complexity, balancing accuracy and latency. Extensive experimental results show that our proposed scheme reduces latency by nearly 40% and improves 2.2% on average in accuracy, outperforming the latest benchmarks.
翻译:视频分析需要大量计算资源,这在计算资源受限的环境中构成了重大挑战。本文为在可接受的计算负载下实现高精度,提出了一种基于信息编码元数据的高性价比感兴趣区域提取与自适应推理方案。具体而言,为实现高效的基于感兴趣区域的分析,我们利用编码元数据中的运动矢量,通过形态学开运算在非参考帧中识别感兴趣区域。此外,考虑到感兴趣区域的内容变化需要不同规模的模型进行推理,我们基于编码元数据中的码率分配信息来衡量感兴趣区域的复杂度。最后,我们设计了一种算法,该算法优先将感兴趣区域调度至具有适当复杂度的模型,以平衡精度与延迟。大量实验结果表明,我们所提出的方案将延迟降低了近40%,平均精度提高了2.2%,优于最新的基准方案。