Pushing the Boundary: Specialising Deep Configuration Performance Learning

Software systems often have numerous configuration options that can be adjusted to meet different performance requirements. However, understanding the combined impact of these options on performance is often challenging, especially with limited real-world data. To tackle this issue, deep learning techniques have gained popularity due to their ability to capture complex relationships even with limited samples. This thesis begins with a systematic literature review of deep learning techniques in configuration performance modeling, analyzing 85 primary papers out of 948 searched papers. It identifies knowledge gaps and sets three objectives for the thesis. The first knowledge gap is the lack of understanding about which encoding scheme is better and in what circumstances. To address this, the thesis conducts an empirical study comparing three popular encoding schemes. Actionable suggestions are provided to support more reliable decisions. Another knowledge gap is the sparsity inherited from the configuration landscape. To handle this, the thesis proposes a model-agnostic and sparsity-robust framework called DaL, which uses a "divide-and-learn" approach. DaL outperforms state-of-the-art approaches in accuracy improvement across various real-world systems. The thesis also addresses the limitation of predicting under static environments by proposing a sequential meta-learning framework called SeMPL. Unlike traditional meta-learning frameworks, SeMPL trains meta-environments in a specialized order, resulting in significantly improved prediction accuracy in multi-environment scenarios. Overall, the thesis identifies and addresses critical knowledge gaps in deep performance learning, significantly advancing the accuracy of performance prediction.

翻译：软件系统通常包含众多可调整的配置选项以满足不同的性能需求。然而，理解这些选项对性能的综合影响往往具有挑战性，尤其是在实际数据有限的情况下。为解决这一问题，深度学习技术因其能够在有限样本中捕捉复杂关系的能力而受到广泛关注。本文首先对配置性能建模中的深度学习技术进行了系统性文献综述，从检索到的948篇论文中分析了85篇核心文献，识别了知识缺口并确立了本论文的三个研究目标。第一个知识缺口在于缺乏对何种编码方案更优及其适用情境的理解。为此，本文通过实证研究比较了三种主流编码方案，并提供了可操作的决策建议以支持更可靠的选择。第二个知识缺口源于配置空间固有的稀疏性问题。针对此问题，本文提出了一种模型无关且具有稀疏鲁棒性的框架DaL，该框架采用"分而治之"的学习策略。在实际系统测试中，DaL在精度提升方面优于现有最先进方法。此外，本文还针对静态环境预测的局限性，提出了名为SeMPL的序列元学习框架。与传统元学习方法不同，SeMPL通过特定顺序训练元环境，在多环境场景中显著提升了预测精度。总体而言，本文识别并解决了深度性能学习中的关键知识缺口，显著推进了性能预测的准确性发展。