Software vulnerabilities are often detected via taint analysis, penetration testing, or fuzzing. They are also found via unit tests that exercise security-sensitive behavior with specific inputs, called vulnerability-witnessing tests. Generative AI models could help developers in writing them, but they require many examples to learn from, which are currently scarce. This paper introduces VuTeCo, an AI-driven framework for collecting examples of vulnerability-witnessing tests from Java repositories. VuTeCo carries out two tasks: (1) The "Finding" task to determine whether a unit test case is security-related, and (2) the "Matching" task to relate a test case to the vulnerability it witnesses. VuTeCo addresses the Finding task with UniXcoder, achieving an F0.5 score of 0.73 and a precision of 0.83 on a test set of unit tests from Vul4J. The Matching task is addressed using DeepSeek Coder, achieving an F0.5 score of 0.65 and a precision of 0.75 on a test set of pairs of unit tests and vulnerabilities from Vul4J. VuTeCo has been used in the wild on 427 Java projects and 1,238 vulnerabilities, obtaining 224 test cases confirmed to be security-related and 35 tests correctly matched to 29 vulnerabilities. The validated tests were collected in a new dataset called Test4Vul. VuTeCo lays the foundation for large-scale retrieval of vulnerability-witnessing tests, enabling future AI models to better understand and generate security unit tests.
翻译:软件漏洞通常通过污点分析、渗透测试或模糊测试来检测。它们也可以通过单元测试来发现,这些测试使用特定输入执行安全敏感行为,称为漏洞见证测试。生成式AI模型可以帮助开发者编写此类测试,但它们需要大量示例进行学习,而目前此类示例稀缺。本文介绍了VuTeCo,一个从Java代码库中收集漏洞见证测试示例的AI驱动框架。VuTeCo执行两项任务:(1) "发现"任务,用于判定单元测试用例是否与安全相关;(2) "匹配"任务,将测试用例与其所见证的漏洞相关联。VuTeCo使用UniXcoder处理发现任务,在Vul4J的单元测试数据集上取得了F0.5分数0.73和精确率0.83。匹配任务采用DeepSeek Coder实现,在Vul4J的测试用例与漏洞配对数据集上取得了F0.5分数0.65和精确率0.75。VuTeCo已在427个Java项目和1,238个漏洞的实际环境中应用,获得了224个经确认与安全相关的测试用例,以及35个正确匹配到29个漏洞的测试。经验证的测试用例被收集到名为Test4Vul的新数据集中。VuTeCo为大规模检索漏洞见证测试奠定了基础,使未来的AI模型能够更好地理解和生成安全单元测试。