TraDE: Network and Traffic-aware Adaptive Scheduling for Microservices Under Dynamics

The transition from monolithic architecture to microservices has enhanced flexibility in application design and its scalable execution. This approach often involves using a computing cluster managed by a container orchestration platform, which supports the deployment of microservices. However, this shift introduces significant challenges, particularly in the efficient scheduling of containerized services. These challenges are compounded by unpredictable scenarios such as dynamic incoming workloads with various execution traffic and variable communication delays among cluster nodes. Existing works often overlook the real-time traffic impacts of dynamic requests on running microservices, as well as the varied communication delays across cluster nodes. Consequently, even optimally deployed microservices could suffer from significant performance degradation over time. To address these issues, we introduce a network and traffic-aware adaptive scheduling framework, TraDE. This framework can adaptively redeploy microservice containers to maintain desired performance amid changing traffic and network conditions within the hosting cluster. We have implemented TraDE as an extension to the Kubernetes platform. Additionally, we deployed realistic microservice applications in a real compute cluster and conducted extensive experiments to assess our framework's performance in various scenarios. The results demonstrate the effectiveness of TraDE in rescheduling running microservices to enhance end-to-end performance while maintaining a high goodput ratio. Compared with the existing method NetMARKS, TraDE outperforms it by reducing the average response time of the application by up to 48.3\%, and improving the throughput by up to 1.4x while maintaining a goodput ratio of 95.36\% and showing robust adaptive capability under sustained workloads.

翻译：从单体架构向微服务的转型提升了应用设计的灵活性及其可扩展执行能力。该方法通常涉及使用由容器编排平台管理的计算集群，以支持微服务的部署。然而，这一转变也带来了显著挑战，尤其是在容器化服务的高效调度方面。这些挑战因不可预测的场景而加剧，例如具有不同执行流量的动态输入工作负载以及集群节点间可变的通信延迟。现有研究往往忽视了动态请求对运行中微服务的实时流量影响，以及集群节点间通信延迟的差异性。因此，即使是最优部署的微服务，随着时间的推移也可能遭受显著的性能下降。为解决这些问题，我们提出了一种网络与流量感知的自适应调度框架——TraDE。该框架能够自适应地重新部署微服务容器，以在托管集群内不断变化的流量和网络条件下维持期望的性能。我们已将TraDE实现为Kubernetes平台的扩展模块。此外，我们在真实计算集群中部署了实际的微服务应用，并进行了大量实验以评估框架在不同场景下的性能。结果表明，TraDE能有效通过重新调度运行中的微服务来提升端到端性能，同时保持较高的有效吞吐率。与现有方法NetMARKS相比，TraDE将应用的平均响应时间降低了最高达48.3%，吞吐量提升了最高达1.4倍，同时保持了95.36%的有效吞吐率，并在持续工作负载下展现出稳健的自适应能力。