Modern cloud servers routinely co-locate multiple latency-sensitive microservice instances to improve resource efficiency. However, the diversity of microservice behaviors, coupled with mutual performance interference under simultaneous multithreading (SMT), makes large-scale placement increasingly complex. Existing interference aware schedulers and isolation techniques rely on coarse core-level profiling or static resource partitioning, leaving asymmetric hyperthread-level heterogeneity and SMT contention dynamics largely unmodeled. We present Hestia, a hyperthread-level, interference-aware scheduling framework powered by self-attention. Through an extensive analysis of production traces encompassing 32,408 instances across 3,132 servers, we identify two dominant contention patterns -- sharing-core (SC) and sharing-socket (SS) -- and reveal strong asymmetry in their impact. Guided by these insights, Hestia incorporates (1) a self-attention-based CPU usage predictor that models SC/SS contention and hardware heterogeneity, and (2) an interference scoring model that estimates pairwise contention risks to guide scheduling decisions. We evaluate Hestia through large-scale simulation and a real production deployment. Hestia reduces the 95th-percentile service latency by up to 80\%, lowers overall CPU consumption by 2.3\% under the same workload, and surpasses five state-of-the-art schedulers by up to 30.65\% across diverse contention scenarios.
翻译:现代云服务器通常将多个延迟敏感的微服务实例共置以提高资源效率。然而,微服务行为的多样性,加上同步多线程(SMT)下的相互性能干扰,使得大规模部署日益复杂。现有的干扰感知调度器和隔离技术依赖于粗粒度的核心级性能分析或静态资源分区,未能充分建模非对称的超线程级异构性和SMT争用动态。本文提出Hestia,一种由自注意力机制驱动的超线程级干扰感知调度框架。通过对涵盖3,132台服务器上32,408个实例的生产环境追踪数据进行广泛分析,我们识别出两种主要的争用模式——共享核心(SC)和共享插槽(SS),并揭示了其影响的强不对称性。基于这些发现,Hestia整合了(1)一个基于自注意力的CPU使用率预测器,用于建模SC/SS争用和硬件异构性;以及(2)一个干扰评分模型,用于估计成对争用风险以指导调度决策。我们通过大规模仿真和实际生产部署对Hestia进行评估。Hestia将第95百分位服务延迟降低高达80%,在相同工作负载下整体CPU消耗降低2.3%,并在多种争用场景中超越五种先进调度器高达30.65%。