Boundary value analysis and testing (BVT) is fundamental in software quality assurance because faults tend to cluster at input extremes, yet testers often struggle to understand and justify why certain input-output pairs represent meaningful behavioral boundaries. Large Language Models (LLMs) could help by producing natural-language rationales, but their value for BVT has not been empirically assessed. We therefore conducted an exploratory study on LLM-generated boundary explanations: in a survey, twenty-seven software professionals rated GPT-4.1 explanations for twenty boundary pairs on clarity, correctness, completeness and perceived usefulness, and six of them elaborated in follow-up interviews. Overall, 63.5% of all ratings were positive (4-5 on a five-point Likert scale) compared to 17% negative (1-2), indicating general agreement but also variability in perceptions. Participants favored explanations that followed a clear structure, cited authoritative sources, and adapted their depth to the reader's expertise; they also stressed the need for actionable examples to support debugging and documentation. From these insights, we distilled a seven-item requirement checklist that defines concrete design criteria for future LLM-based boundary explanation tools. The results suggest that, with further refinement, LLM-based tools can support testing workflows by making boundary explanations more actionable and trustworthy.
翻译:边界值分析与测试(BVT)是软件质量保障的基础,因为故障往往聚集在输入极端值附近,但测试人员常常难以理解并证明为何特定的输入-输出对代表了有意义的行为边界。大型语言模型(LLMs)可以通过生成自然语言原理说明来提供帮助,但其在BVT中的价值尚未经过实证评估。为此,我们对LLM生成的边界解释进行了探索性研究:在一项调查中,27名软件专业人员对GPT-4.1为20组边界对生成的解释在清晰度、正确性、完整性和感知有用性方面进行了评分(采用五点李克特量表),其中6人还在后续访谈中进行了深入阐述。总体而言,63.5%的评分呈积极(4-5分),而负面评分(1-2分)仅占17%,这表明普遍认可的同时也存在认知差异。参与者更青睐那些结构清晰、引用权威来源、并能根据读者专业水平调整解释深度的说明;他们还强调需要可操作的示例来支持调试和文档编写。基于这些发现,我们提炼出一份包含七项要求的检查清单,为未来基于LLM的边界解释工具定义了具体的设计标准。研究结果表明,通过进一步优化,基于LLM的工具能够通过使边界解释更具可操作性和可信度来支持测试工作流程。