Modern multi-tenant, hardware-heterogeneous computing environments pose significant challenges for effective workload orchestration. Simple heuristics for assessing workload performance, such as CPU utilization or application-level metrics, are often insufficient to capture the complex performance dynamics arising from resource contention and noisy-neighbor effects. In such environments, performance bottlenecks may emerge in any shared system resource, leading to unexpected and difficult-to-diagnose degradation. This paper introduces buoyancy, a novel abstraction for characterizing workload performance in multi-tenant systems. Unlike traditional approaches, buoyancy integrates application-level metrics with system-level insights of shared resource contention to provide a holistic view of performance dynamics. By explicitly capturing bottlenecks and headroom across multiple resources, buoyancy facilitates resource-aware and application-aware orchestration in a manner that is intuitive, extensible, and generalizable across heterogeneous platforms. We evaluate buoyancy using representative multi-tenant workloads to illustrate its ability to expose performance-limiting resource interactions. Buoyancy provides a 19.3% better indication of bottlenecks compared to traditional heuristics on average. We additionally show how buoyancy can act as a drop-in replacement for conventional performance metrics, enabling improved observability and more informed scheduling and optimization decisions.
翻译:现代多租户、硬件异构的计算环境对有效的工作负载编排提出了重大挑战。评估工作负载性能的简单启发式方法,如CPU利用率或应用级指标,通常不足以捕捉由资源争用和噪声邻居效应产生的复杂性能动态。在此类环境中,性能瓶颈可能出现在任何共享系统资源中,导致难以预料且难以诊断的性能下降。本文引入了浮力这一新颖抽象,用于表征多租户系统中的工作负载性能。与传统方法不同,浮力将应用级指标与共享资源争用的系统级洞察相结合,提供了性能动态的整体视图。通过显式捕获跨多个资源的瓶颈和余量,浮力以直观、可扩展且可跨异构平台泛化的方式,促进了资源感知和应用感知的编排。我们使用代表性的多租户工作负载评估浮力,以展示其揭示性能限制性资源交互的能力。与传统启发式方法相比,浮力平均能提供19.3%更优的瓶颈指示。我们还展示了浮力如何作为传统性能指标的即插即用替代品,从而实现改进的可观测性以及更明智的调度与优化决策。