Software vulnerabilities are often detected via taint analysis, penetration testing, or fuzzing. They are also found via unit tests that exercise security-sensitive behavior with specific inputs, called vulnerability-witnessing tests. Generative AI models could help developers in writing them, but they require many examples to learn from, which are currently scarce. This paper introduces VuTeCo, an AI-driven framework for collecting examples of vulnerability-witnessing tests from Java repositories. VuTeCo carries out two tasks: (1) The "Finding" task to determine whether a unit test case is security-related, and (2) the "Matching" task to relate a test case to the vulnerability it witnesses. VuTeCo addresses the Finding task with UniXcoder, achieving an F0.5 score of 0.73 and a precision of 0.83 on a test set of unit tests from Vul4J. The Matching task is addressed using DeepSeek Coder, achieving an F0.5 score of 0.65 and a precision of 0.75 on a test set of pairs of unit tests and vulnerabilities from Vul4J. VuTeCo has been used in the wild on 427 Java projects and 1,238 vulnerabilities, obtaining 224 test cases confirmed to be security-related and 35 tests correctly matched to 29 vulnerabilities. The validated tests were collected in a new dataset called Test4Vul. VuTeCo lays the foundation for large-scale retrieval of vulnerability-witnessing tests, enabling future AI models to better understand and generate security unit tests.
翻译:软件漏洞通常通过污点分析、渗透测试或模糊测试来检测。它们也可以通过单元测试来发现,这些测试利用特定输入执行安全敏感行为,称为漏洞见证测试。生成式人工智能模型可以帮助开发者编写此类测试,但它们需要大量学习样本,而目前此类样本稀缺。本文介绍了VuTeCo,一个用于从Java代码库中收集漏洞见证测试样本的人工智能驱动框架。VuTeCo执行两项任务:(1)“发现”任务:判断单元测试用例是否与安全相关;(2)“匹配”任务:将测试用例与其所见证的漏洞相关联。VuTeCo使用UniXcoder处理发现任务,在基于Vul4J单元测试构建的测试集上取得了F0.5分数0.73和精确率0.83的成果。匹配任务则采用DeepSeek Coder实现,在基于Vul4J测试用例与漏洞对构建的测试集上获得F0.5分数0.65和精确率0.75。VuTeCo已在实际环境中应用于427个Java项目和1,238个漏洞,成功识别出224个经确认与安全相关的测试用例,并将35个测试准确匹配到29个漏洞。经验证的测试用例被收录于名为Test4Vul的新数据集中。VuTeCo为大规模检索漏洞见证测试奠定了基础,使未来的人工智能模型能够更好地理解和生成安全单元测试。