Capacity management is critical for software organizations to allocate resources effectively and meet operational demands. An important step in capacity management is predicting future resource needs often relies on data-driven analytics and machine learning (ML) forecasting models, which require frequent retraining to stay relevant as data evolves. Continuously retraining the forecasting models can be expensive and difficult to scale, posing a challenge for engineering teams tasked with balancing accuracy and efficiency. Retraining only when the data changes appears to be a more computationally efficient alternative, but its impact on accuracy requires further investigation. In this work, we investigate the effects of retraining capacity forecasting models for time series based on detected changes in the data compared to periodic retraining. Our results show that drift-based retraining achieves comparable forecasting accuracy to periodic retraining in most cases, making it a cost-effective strategy. However, in cases where data is changing rapidly, periodic retraining is still preferred to maximize the forecasting accuracy. These findings offer actionable insights for software teams to enhance forecasting systems, reducing retraining overhead while maintaining robust performance.
翻译:容量管理对于软件组织有效分配资源并满足运营需求至关重要。容量管理中的一个重要步骤是预测未来资源需求,这通常依赖于数据驱动的分析和机器学习预测模型。随着数据的演变,这些模型需要频繁重新训练以保持相关性。持续重新训练预测模型成本高昂且难以扩展,这对需要在准确性和效率之间取得平衡的工程团队构成了挑战。仅在数据变化时进行重新训练似乎是计算效率更高的替代方案,但其对准确性的影响需要进一步研究。在本工作中,我们研究了基于检测到的数据变化对时间序列容量预测模型进行重新训练的效果,并与定期重新训练进行了比较。我们的结果表明,在大多数情况下,基于漂移的重新训练能达到与定期重新训练相当的预测精度,使其成为一种经济高效的策略。然而,在数据快速变化的情况下,仍建议采用定期重新训练以最大化预测精度。这些发现为软件团队提供了可行的见解,以增强预测系统,在保持稳健性能的同时减少重新训练的开销。