The transition from monolithic architecture to microservices has enhanced flexibility in application design and its scalable execution. This approach typically uses a computing cluster managed by a container orchestration platform to deploy microservices. However, this shift introduces significant challenges, particularly in the efficient scheduling of containerized services. These challenges are compounded by unpredictable scenarios such as dynamic incoming workloads with various execution traffic and variable communication delays among cluster nodes. Existing works often overlook the real-time traffic impacts of dynamic requests on running microservices, as well as the varied communication delays across cluster nodes. Consequently, even optimally deployed microservices could suffer from significant performance degradation over time. To address these issues, we propose a network and traffic-aware adaptive scheduling framework, TraDE, which can adaptively redeploy microservice instances to maintain desired performance amid changing traffic and network conditions within the hosting cluster. We have implemented TraDE as an extension to the Kubernetes platform. Additionally, we deployed realistic microservice applications in a real compute cluster and conducted extensive experiments to assess our framework's performance in various scenarios. The results demonstrate the effectiveness of TraDE in rescheduling running microservices to enhance end-to-end performance while maintaining a high goodput ratio. Compared with the existing method NetMARKS, TraDE outperforms it by reducing the average response time of the application by up to 48.3%, and improving the throughput by up to 1.2-1.5x across workloads while maintaining a goodput ratio of 95.36%, and showing robust adaptive capability to meet QoS targets under sustained workloads and dynamic networking conditions.
翻译:从单体架构向微服务的转型提升了应用设计的灵活性及其可扩展执行能力。该方法通常使用由容器编排平台管理的计算集群来部署微服务。然而,这一转变也带来了重大挑战,尤其是在容器化服务的高效调度方面。这些挑战因不可预测的场景而加剧,例如具有不同执行流量的动态传入工作负载以及集群节点间可变的通信延迟。现有工作往往忽视了动态请求对运行中微服务的实时流量影响,以及集群节点间差异化的通信延迟。因此,即使是最优部署的微服务,随着时间的推移也可能遭受显著的性能下降。为解决这些问题,我们提出了一种网络与流量感知的自适应调度框架TraDE,该框架能够自适应地重新部署微服务实例,以在托管集群内部不断变化的流量和网络条件下维持期望的性能。我们已将TraDE实现为Kubernetes平台的扩展。此外,我们在真实计算集群中部署了实际的微服务应用,并进行了大量实验以评估我们框架在不同场景下的性能。结果表明,TraDE能够通过重新调度运行中的微服务来提升端到端性能,同时保持较高的有效吞吐率。与现有方法NetMARKS相比,TraDE在保持95.36%有效吞吐率的同时,将应用的平均响应时间降低了最高48.3%,并在不同工作负载下将吞吐量提升了最高1.2-1.5倍,同时展现出强大的自适应能力以满足持续工作负载和动态网络条件下的服务质量目标。