Misconfigurations are major causes of software failures. Existing practices rely on developer-written rules or test cases to validate configurations, which are expensive. Machine learning (ML) for configuration validation is considered a promising direction, but has been facing challenges such as the need of large-scale field data and system-specific models. Recent advances in Large Language Models (LLMs) show promise in addressing some of the long-lasting limitations of ML-based configuration validation. We present a first analysis on the feasibility and effectiveness of using LLMs for configuration validation. We empirically evaluate LLMs as configuration validators by developing a generic LLM-based configuration validation framework, named Ciri. Ciri employs effective prompt engineering with few-shot learning based on both valid configuration and misconfiguration data. Ciri checks outputs from LLMs when producing results, addressing hallucination and nondeterminism of LLMs. We evaluate Ciri's validation effectiveness on eight popular LLMs using configuration data of ten widely deployed open-source systems. Our analysis (1) confirms the potential of using LLMs for configuration validation, (2) explores design space of LLMbased validators like Ciri, and (3) reveals open challenges such as ineffectiveness in detecting certain types of misconfigurations and biases towards popular configuration parameters.
翻译:错误配置是导致软件故障的主要原因。现有实践依赖开发人员编写的规则或测试用例来验证配置,成本高昂。基于机器学习的配置验证被认为是一个有前景的方向,但一直面临需要大规模现场数据和系统特定模型等挑战。大语言模型的最新进展在解决基于机器学习的配置验证长期存在的局限性方面展现出潜力。我们首次分析了使用大语言模型进行配置验证的可行性与有效性。通过开发一个名为Ciri的通用大语言模型配置验证框架,我们实证评估了大语言模型作为配置验证器的能力。Ciri采用基于有效配置和错误配置数据的少样本学习以及有效的提示工程。在生成结果时,Ciri会检查大语言模型的输出,以应对其幻觉性和非确定性。我们使用十个广泛部署的开源系统的配置数据,在八种流行的大语言模型上评估了Ciri的验证效果。我们的分析:(1)确认了使用大语言模型进行配置验证的潜力,(2)探索了像Ciri这样的大语言模型验证器的设计空间,(3)揭示了尚未解决的挑战,例如在检测特定类型的错误配置方面效果不佳,以及对流行配置参数的偏好。