Validating threat modeling results remains difficult because completeness is hard to judge without an external oracle. Existing studies often rely on expert-produced reference models and other human baselines, but these can contain omissions or disagreements. This paper evaluates a complementary, vulnerability-grounded validation approach. We apply threat modeling to intentionally vulnerable applications with a known vulnerability set to measure the number of related vulnerabilities that can be discovered. We compare ThreMoLIA, an LLM-assisted threat modeling solution developed by our team, with the Microsoft Threat Modeling Tool (MTMT) across two vulnerable applications: AzureGoat and the Vulnerable Bank Application (VulnBank). The inputs to both tools are limited to architecture, data flow diagrams, and their descriptions. The results show that ThreMoLIA achieved higher vulnerability coverage on both systems. We show that vulnerable test applications provide a practical benchmark for assessing threat coverage and complement expert-based validation.
翻译:验证威胁建模结果的完整性仍然困难,因为缺乏外部基准时难以判断完备性。现有研究常依赖专家生成的参考模型及其他人工基线,但这些方法可能存在遗漏或分歧。本文评估了一种基于漏洞的互补性验证方法。我们针对已知漏洞集合的脆弱性应用进行威胁建模,以衡量可发现的关联漏洞数量。我们比较了团队开发的LLM辅助威胁建模工具ThreMoLIA与微软威胁建模工具(MTMT)在AzureGoat和脆弱银行应用(VulnBank)上的表现。两者的输入均限定为架构图、数据流图及其描述。结果表明,ThreMoLIA在两个系统上均实现了更高的漏洞覆盖率。我们证明,脆弱测试应用为评估威胁覆盖率提供了实用基准,并能补充基于专家的验证方法。