An Interference-aware Approach for Co-located Container Orchestration with Novel Metric

Container orchestration technologies are widely employed in cloud computing, facilitating the co-location of online and offline services on the same infrastructure. Online services demand rapid responsiveness and high availability, whereas offline services require extensive computational resources. However, this mixed deployment can lead to resource contention, adversely affecting the performance of online services, yet the metrics used by existing methods cannot accurately reflect the extent of interference. In this paper, we introduce scheduling latency as a novel metric for quantifying interference and compare it with existing metrics. Empirical evidence demonstrates that scheduling latency more accurately reflects the performance degradation of online services. We also utilize various machine learning techniques to predict potential interference on specific hosts for online services, providing reference information for subsequent scheduling decisions. Simultaneously, we propose a method for quantifying node interference based on scheduling latency. To enhance resource utilization, we train a model for online services that predicts CPU and MEM (memory) resource allocation based on workload type and QPS. Finally, we present a scheduling algorithm based on predictive modeling, aiming to reduce interference in online services while balancing node resource utilization. Through experiments and comparisons with three other baseline methods, we demonstrate the effectiveness of our approach. Compared with three baselines, our approach can reduce the average response time, 90th percentile response time, and 99th percentile response time of online services by 29.4%, 31.4%, and 14.5%, respectively.

翻译：容器编排技术广泛应用于云计算领域，促进在线与离线服务在同一基础设施上的混合部署。在线服务要求快速响应和高可用性，而离线服务则需要大量计算资源。然而，这种混合部署可能导致资源争用，进而影响在线服务的性能，但现有方法使用的指标无法准确反映干扰程度。本文提出将调度延迟作为衡量干扰的新指标，并将其与现有指标进行对比。实验证明，调度延迟能更准确地反映在线服务的性能下降情况。我们还利用多种机器学习技术预测特定主机上对在线服务可能造成的干扰，为后续调度决策提供参考信息。同时，我们提出了一种基于调度延迟的节点干扰量化方法。为提高资源利用率，我们为在线服务训练了一个模型，该模型能根据工作负载类型和QPS（每秒查询数）预测CPU和内存资源分配。最后，我们提出一种基于预测建模的调度算法，旨在减少在线服务干扰的同时平衡节点资源利用率。通过与三种基线方法的实验对比，我们证明了本方法的有效性。与三种基线方法相比，我们的方法可将在线服务的平均响应时间、第90百分位响应时间和第99百分位响应时间分别降低29.4%、31.4%和14.5%。