ILCAS: Imitation Learning-Based Configuration-Adaptive Streaming for Live Video Analytics with Cross-Camera Collaboration

from arxiv, This article has been accepted for publication in IEEE Transactions on Mobile Computing. Citation information: DOI 10.1109/TMC.2023.3327097

The high-accuracy and resource-intensive deep neural networks (DNNs) have been widely adopted by live video analytics (VA), where camera videos are streamed over the network to resource-rich edge/cloud servers for DNN inference. Common video encoding configurations (e.g., resolution and frame rate) have been identified with significant impacts on striking the balance between bandwidth consumption and inference accuracy and therefore their adaption scheme has been a focus of optimization. However, previous profiling-based solutions suffer from high profiling cost, while existing deep reinforcement learning (DRL) based solutions may achieve poor performance due to the usage of fixed reward function for training the agent, which fails to craft the application goals in various scenarios. In this paper, we propose ILCAS, the first imitation learning (IL) based configuration-adaptive VA streaming system. Unlike DRL-based solutions, ILCAS trains the agent with demonstrations collected from the expert which is designed as an offline optimal policy that solves the configuration adaption problem through dynamic programming. To tackle the challenge of video content dynamics, ILCAS derives motion feature maps based on motion vectors which allow ILCAS to visually ``perceive'' video content changes. Moreover, ILCAS incorporates a cross-camera collaboration scheme to exploit the spatio-temporal correlations of cameras for more proper configuration selection. Extensive experiments confirm the superiority of ILCAS compared with state-of-the-art solutions, with 2-20.9% improvement of mean accuracy and 19.9-85.3% reduction of chunk upload lag.

翻译：高精度且资源消耗大的深度神经网络（DNN）已被广泛应用于实时视频分析（VA），其中摄像头视频通过网络传输至资源丰富的边缘/云端服务器进行DNN推理。常见的视频编码配置（如分辨率、帧率）对平衡带宽消耗与推理精度具有显著影响，因此其自适应方案一直是优化重点。然而，先前基于性能建模的方法面临高昂建模成本，而现有基于深度强化学习（DRL）的解决方案因采用固定奖励函数训练智能体，难以在多种场景中精准刻画应用目标，导致性能不佳。本文提出ILCAS——首个基于模仿学习（IL）的配置自适应VA流传输系统。与DRL方案不同，ILCAS通过从专家策略收集的演示数据训练智能体，该专家策略被设计为通过动态规划解决配置自适应问题的离线最优策略。为应对视频内容动态性挑战，ILCAS基于运动矢量导出运动特征图，使其能够以视觉方式“感知”视频内容变化。此外，ILCAS还引入跨摄像头协作方案，利用摄像头间的时空关联性实现更合理的配置选择。大量实验证实，与现有最优方案相比，ILCAS在平均精度上提升2%-20.9%，分块上传延迟降低19.9%-85.3%。