Accurate prediction of application performance is critical for enabling effective scheduling and resource management in resource-constrained dynamic edge environments. However, achieving predictable performance in such environments remains challenging due to the co-location of multiple applications and the node heterogeneity. To address this, we propose a methodology that automatically builds and assesses various performance predictors. This approach prioritizes both accuracy and inference time to identify the most efficient model. Our predictors achieve up to 90% accuracy while maintaining an inference time of less than 1% of the Round Trip Time. These predictors are trained on the historical state of the most correlated monitoring metrics to application performance and evaluated across multiple servers in dynamic co-location scenarios. As usecase we consider electron microscopy (EM) workflows, which have stringent real-time demands and diverse resource requirements. Our findings emphasize the need for a systematic methodology that selects server-specific predictors by jointly optimizing accuracy and inference latency in dynamic co-location scenarios. Integrating such predictors into edge environments can improve resource utilization and result in predictable performance.
翻译:在资源受限的动态边缘环境中,准确预测应用性能对于实现有效的调度和资源管理至关重要。然而,由于多个应用共址部署及节点异构性,在此类环境中实现可预测的性能仍面临挑战。为此,我们提出一种自动构建与评估多种性能预测模型的方法。该方法同时兼顾精度与推理时间,以识别最高效的模型。我们的预测模型在保持推理时间低于往返时延1%的同时,最高可实现90%的预测精度。这些预测器基于与应用程序性能最相关的监控指标历史状态进行训练,并在动态共址场景下的多台服务器上进行评估。我们以电子显微镜工作流作为用例,该工作流具有严格的实时性要求和多样化的资源需求。研究结果表明,需要通过系统化方法在动态共址场景中联合优化精度与推理延迟,从而选择适用于特定服务器的预测模型。将此类预测器集成到边缘环境中,能够提升资源利用率并实现可预测的系统性能。