Deep learning inference on streaming media data, such as object detection in video or LiDAR feeds and text extraction from audio waves, is now ubiquitous. To achieve high inference accuracy, these applications typically require significant network bandwidth to gather high-fidelity data and extensive GPU resources to run deep neural networks (DNNs). While the high demand for network bandwidth and GPU resources could be substantially reduced by optimally adapting the configuration knobs, such as video resolution and frame rate, current adaptation techniques fail to meet three requirements simultaneously: adapt configurations (i) with minimum extra GPU or bandwidth overhead; (ii) to reach near-optimal decisions based on how the data affects the final DNN's accuracy, and (iii) do so for a range of configuration knobs. This paper presents OneAdapt, which meets these requirements by leveraging a gradient-ascent strategy to adapt configuration knobs. The key idea is to embrace DNNs' differentiability to quickly estimate the accuracy's gradient to each configuration knob, called AccGrad. Specifically, OneAdapt estimates AccGrad by multiplying two gradients: InputGrad (i.e. how each configuration knob affects the input to the DNN) and DNNGrad (i.e. how the DNN input affects the DNN inference output). We evaluate OneAdapt across five types of configurations, four analytic tasks, and five types of input data. Compared to state-of-the-art adaptation schemes, OneAdapt cuts bandwidth usage and GPU usage by 15-59% while maintaining comparable accuracy or improves accuracy by 1-5% while using equal or fewer resources.
翻译:流媒体数据(如视频或激光雷达馈送中的目标检测、音频波中的文本提取)上的深度学习推理现已无处不在。为达到高推理精度,这些应用通常需要大量网络带宽以收集高保真数据,以及大量GPU资源以运行深度神经网络(DNN)。虽然通过最优调整配置参数(如视频分辨率和帧率)可大幅降低对网络带宽和GPU资源的高需求,现有自适应技术无法同时满足三项要求:即(i)以最小的额外GPU或带宽开销自适应配置;(ii)基于数据对最终DNN精度的影响做出近乎最优的决策;(iii)对多种配置参数均能实现上述自适应。本文提出OneAdapt,通过采用梯度上升策略调整配置参数来满足这些要求。其核心思想是利用DNN的可微分性,快速估计精度对每个配置参数的梯度(称为AccGrad)。具体而言,OneAdapt通过将两个梯度相乘来估计AccGrad:InputGrad(即每个配置参数如何影响DNN的输入)和DNNGrad(即DNN输入如何影响DNN推理输出)。我们在五种配置类型、四种分析任务和五种输入数据类型上评估了OneAdapt。与最先进的自适应方案相比,OneAdapt在保持相当精度的同时将带宽使用量和GPU使用量降低了15-59%,或在同等或更少资源下将精度提高了1-5%。