In modern distributed systems, efficient resource allocation is a vital aspect to maintain scalability, reduce operational costs, and ensure fast execution even across heterogeneous workloads. Predictive models for resource usage are essential tools for optimizing allocation and preventing system bottlenecks. Predictive memory allocation has asymmetric costs as a key challenge: underallocation causes failures while overallocation wastes memory. We propose a regression method based on a LightGBM and XGBoost ensemble trained to predict high conditional quantiles. To further account for the high cost of underallocations we add a multiplicative safety factor. With our method we are able to reduce the number of under-allocated jobs from 4.17% to 2.89% and average overallocation from 148% to 44.51% on a real-world dataset of build jobs provided by SAP. We further explore the pareto frontier between optimization for underallocation and for overallocation.
翻译:在现代分布式系统中,高效资源分配是维持可扩展性、降低运维成本、并确保跨异构工作负载快速执行的关键方面。基于预测的资源使用模型是优化分配和防止系统瓶颈的重要工具。预测性内存分配面临的一个关键挑战是成本不对称:分配不足会导致任务失败,而过度分配则浪费内存。我们提出了一种基于LightGBM和XGBoost集成学习训练的回归方法,用于预测高条件分位数。为进一步考虑分配不足的高昂代价,我们引入了一个乘法安全因子。通过该方法,在SAP提供的构建作业真实数据集上,我们能够将分配不足的作业比例从4.17%降至2.89%,平均过度分配从148%降至44.51%。我们还进一步探索了分配不足优化与过度分配优化之间的帕累托前沿。