理解LLM生成的基于属性的测试在探索边界情况中的特性 (Understanding the Characteristics of LLM-Generated Property-Based Tests in Exploring Edge Cases)

As Large Language Models (LLMs) increasingly generate code in software development, ensuring the quality of LLM-generated code has become important. Traditional testing approaches using Example-based Testing (EBT) often miss edge cases -- defects that occur at boundary values, special input patterns, or extreme conditions. This research investigates the characteristics of LLM-generated Property-based Testing (PBT) compared to EBT for exploring edge cases. We analyze 16 HumanEval problems where standard solutions failed on extended test cases, generating both PBT and EBT test codes using Claude-4-sonnet. Our experimental results reveal that while each method individually achieved a 68.75\% bug detection rate, combining both approaches improved detection to 81.25\%. The analysis demonstrates complementary characteristics: PBT effectively detects performance issues and edge cases through extensive input space exploration, while EBT effectively detects specific boundary conditions and special patterns. These findings suggest that a hybrid approach leveraging both testing methods can improve the reliability of LLM-generated code, providing guidance for test generation strategies in LLM-based code generation.

翻译：随着大型语言模型（LLMs）在软件开发中生成代码的应用日益增多，确保LLM生成代码的质量变得至关重要。传统的基于示例的测试（EBT）方法常常遗漏边界情况——即在边界值、特殊输入模式或极端条件下出现的缺陷。本研究探讨了LLM生成的基于属性的测试（PBT）与EBT在探索边界情况时的特性对比。我们分析了16个HumanEval问题，其中标准解决方案在扩展测试用例上失败，并使用Claude-4-sonnet生成了PBT和EBT测试代码。实验结果表明，尽管每种方法单独实现了68.75%的错误检测率，但结合两种方法可将检测率提升至81.25%。分析揭示了互补特性：PBT通过广泛的输入空间探索有效检测性能问题和边界情况，而EBT则有效检测特定边界条件和特殊模式。这些发现表明，结合两种测试方法的混合策略可以提高LLM生成代码的可靠性，为基于LLM的代码生成中的测试生成策略提供指导。

相关内容

CASES

关注 4

CASES：International Conference on Compilers, Architectures, and Synthesis for Embedded Systems。 Explanation：嵌入式系统编译器、体系结构和综合国际会议。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/cases/index.html

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日