As deep neural networks (DNNs) are being applied to a wide range of edge intelligent applications, it is critical for edge inference platforms to have both high-throughput and low-latency at the same time. Such edge platforms with multiple DNN models pose new challenges for scheduler designs. First, each request may have different service level objectives (SLOs) to improve quality of service (QoS). Second, the edge platforms should be able to efficiently schedule multiple heterogeneous DNN models so that system utilization can be improved. To meet these two goals, this paper proposes BCEdge, a novel learning-based scheduling framework that takes adaptive batching and concurrent execution of DNN inference services on edge platforms. We define a utility function to evaluate the trade-off between throughput and latency. The scheduler in BCEdge leverages maximum entropy-based deep reinforcement learning (DRL) to maximize utility by 1) co-optimizing batch size and 2) the number of concurrent models automatically. Our prototype implemented on different edge platforms shows that the proposed BCEdge enhances utility by up to 37.6% on average, compared to state-of-the-art solutions, while satisfying SLOs.
翻译:随着深度神经网络(DNN)被广泛应用于各类边缘智能应用,边缘推理平台同时实现高吞吐量与低延迟至关重要。此类搭载多DNN模型的边缘平台对调度器设计提出了新挑战:首先,每个请求可能具有不同的服务等级目标(SLO)以提升服务质量(QoS);其次,边缘平台需高效调度多个异构DNN模型以提高系统利用率。为此,本文提出BCEdge——一种基于学习的新型调度框架,在边缘平台上实现自适应批处理与DNN推理服务的并发执行。我们定义了权衡吞吐量与延迟的效用函数。BCEdge中的调度器利用基于最大熵的深度强化学习(DRL),通过1)协同优化批处理大小与2)自动控制并发模型数量来最大化效用。在多种边缘平台上实现的原型验证表明,与现有最优方案相比,BCEdge在满足SLO的同时将效用平均提升高达37.6%。