With the development of artificial intelligence (AI) techniques and the increasing popularity of camera-equipped devices, many edge video analytics applications are emerging, calling for the deployment of computation-intensive AI models at the network edge. Edge inference is a promising solution to move the computation-intensive workloads from low-end devices to a powerful edge server for video analytics, but the device-server communications will remain a bottleneck due to the limited bandwidth. This paper proposes a task-oriented communication framework for edge video analytics, where multiple devices collect the visual sensory data and transmit the informative features to an edge server for processing. To enable low-latency inference, this framework removes video redundancy in spatial and temporal domains and transmits minimal information that is essential for the downstream task, rather than reconstructing the videos at the edge server. Specifically, it extracts compact task-relevant features based on the deterministic information bottleneck (IB) principle, which characterizes a tradeoff between the informativeness of the features and the communication cost. As the features of consecutive frames are temporally correlated, we propose a temporal entropy model (TEM) to reduce the bitrate by taking the previous features as side information in feature encoding. To further improve the inference performance, we build a spatial-temporal fusion module at the server to integrate features of the current and previous frames for joint inference. Extensive experiments on video analytics tasks evidence that the proposed framework effectively encodes task-relevant information of video data and achieves a better rate-performance tradeoff than existing methods.
翻译:随着人工智能技术的发展以及配备摄像头的设备日益普及,许多边缘视频分析应用不断涌现,这要求在网络边缘部署计算密集型的人工智能模型。边缘推理是一种有前景的解决方案,它可以将计算密集型工作负载从低端设备转移到强大的边缘服务器上进行视频分析,但由于带宽有限,设备与服务器之间的通信仍将是一个瓶颈。本文提出了一种面向边缘视频分析的任务导向通信框架,其中多个设备收集视觉感知数据,并将信息丰富的特征传输到边缘服务器进行处理。为了实现低延迟推理,该框架在空间和时间维度上去除视频冗余,仅传输对下游任务至关重要的最小信息,而不是在边缘服务器上重建视频。具体而言,它基于确定性信息瓶颈(IB)原理提取紧凑的任务相关特征,该原理刻画了特征信息量与通信成本之间的权衡。由于连续帧的特征在时间上具有相关性,我们提出了一种时间熵模型(TEM),在特征编码中将前序特征作为边信息来降低比特率。为了进一步提升推理性能,我们在服务器上构建了一个时空融合模块,将当前帧与前序帧的特征进行集成以实现联合推理。在视频分析任务上的大量实验表明,所提出的框架能够有效地编码视频数据中的任务相关信息,并在速率与性能的权衡上优于现有方法。