Cyber attacks have become a serious threat to the security of software systems. Many organizations have built their security knowledge bases to safeguard against attacks and vulnerabilities. However, due to the time lag in the official release of security information, these security knowledge bases may not be well maintained, and using them to protect software systems against emergent security risks can be challenging. On the other hand, the security posts on online knowledge-sharing platforms contain many crowd security discussions and the knowledge in those posts can be used to enhance the security knowledge bases. This paper proposes SynAT, an automatic approach to synthesize attack trees from crowd security posts. Given a security post, SynAT first utilize the Large Language Model (LLM) and prompt learning to restrict the scope of sentences that may contain attack information; then it utilizes a transition-based event and relation extraction model to extract the events and relations simultaneously from the scope; finally, it applies heuristic rules to synthesize the attack trees with the extracted events and relations. An experimental evaluation is conducted on 5,070 Stack Overflow security posts, and the results show that SynAT outperforms all baselines in both event and relation extraction, and achieves the highest tree similarity in attack tree synthesis. Furthermore, SynAT has been applied to enhance HUAWEI's security knowledge base as well as public security knowledge bases CVE and CAPEC, which demonstrates SynAT's practicality.
翻译:网络攻击已成为软件系统安全的严重威胁。许多组织已构建自身的安全知识库以防范攻击和漏洞。然而,由于安全信息官方发布存在时间滞后,这些安全知识库可能维护不善,难以用于保护软件系统应对新兴安全风险。另一方面,在线知识共享平台上的安全帖子包含大量群体安全讨论,其中蕴含的知识可用于增强安全知识库。本文提出SynAT,一种从群体安全帖子中自动合成攻击树的方法。给定安全帖子,SynAT首先利用大语言模型(LLM)和提示学习限定可能包含攻击信息的句子范围;随后采用基于转移的事件与关系联合抽取模型从该范围中同步抽取事件及关系;最后应用启发式规则结合抽取到的事件与关系合成攻击树。我们在5,070个Stack Overflow安全帖子上进行了实验评估,结果表明SynAT在事件与关系抽取任务上均优于所有基线方法,并在攻击树合成中取得了最高的树形相似度。此外,SynAT已成功应用于增强华为安全知识库以及公共安全知识库CVE和CAPEC,这验证了SynAT的实用价值。