Deep neural networks (DNNs) have been widely used in various video analytic tasks. These tasks demand real-time responses. Due to the limited processing power on mobile devices, a common way to support such real-time analytics is to offload the processing to an edge server. This paper examines how to speed up the edge server DNN processing for multiple clients. In particular, we observe batching multiple DNN requests significantly speeds up the processing time. Based on this observation, we first design a novel scheduling algorithm to exploit the batching benefits of all requests that run the same DNN. This is compelling since there are only a handful of DNNs and many requests tend to use the same DNN. Our algorithms are general and can support different objectives, such as minimizing the completion time or maximizing the on-time ratio. We then extend our algorithm to handle requests that use different DNNs with or without shared layers. Finally, we develop a collaborative approach to further improve performance by adaptively processing some of the requests or portions of the requests locally at the clients. This is especially useful when the network and/or server is congested. Our implementation shows the effectiveness of our approach under different request distributions (e.g., Poisson, Pareto, and Constant inter-arrivals).
翻译:深度神经网络(DNN)已广泛应用于各类视频分析任务中,这些任务要求实时响应。由于移动设备处理能力有限,支持此类实时分析的常见方式是将处理任务卸载至边缘服务器。本文研究如何加速边缘服务器为多客户端提供DNN处理。具体而言,我们观察到将多个DNN请求进行批处理可显著缩短处理时间。基于此发现,我们首先设计了一种新型调度算法,以利用运行相同DNN的所有请求的批处理优势。考虑到实际中仅有少量DNN类型且多数请求倾向于使用相同DNN,该算法具有显著现实意义。我们的算法具有通用性,可支持不同优化目标,如最小化完成时间或最大化准时率。随后,我们将算法扩展至处理使用不同DNN(含共享层或不含共享层)的请求。最后,我们提出一种协作方法——通过自适应地在客户端本地处理部分或完整请求,进一步优化性能。这在网络或服务器拥塞时尤为有效。实验结果表明,该方法在多种请求分布(如泊松分布、帕累托分布及恒定到达间隔)下均具有有效性。