Towards the generation of hierarchical attack models from cybersecurity vulnerabilities using language models

This paper investigates the use of a pre-trained language model and siamese network to discern sibling relationships between text-based cybersecurity vulnerability data. The ultimate purpose of the approach presented in this paper is towards the construction of hierarchical attack models based on a set of text descriptions characterising potential/observed vulnerabilities in a given system. Due to the nature of the data, and the uncertainty sensitive environment in which the problem is presented, a practically oriented soft computing approach is necessary. Therefore, a key focus of this work is to investigate practical questions surrounding the reliability of predicted links towards the construction of such models, to which end conceptual and practical challenges and solutions associated with the proposed approach are outlined, such as dataset complexity and stability of predictions. Accordingly, the contributions of this paper focus on producing neural networks using a pre-trained language model for predicting sibling relationships between cybersecurity vulnerabilities, then outlining how to apply this capability towards the generation of hierarchical attack models. In addition, two data sampling mechanisms for tackling data complexity, and a consensus mechanism for reducing the amount of false positive predictions are outlined. Each of these approaches is compared and contrasted using empirical results from three sets of cybersecurity data to determine their effectiveness.

翻译：本文研究利用预训练语言模型和孪生网络识别基于文本的网络安全漏洞数据间的同源关系。所提出方法的最终目标在于，基于描述特定系统中潜在/已观测漏洞的文本描述集合，构建层次化攻击模型。鉴于数据特性及问题所处的不确定性敏感环境，需要采用面向实践的软计算方法。因此，本工作的核心聚焦于探究此类模型构建过程中预测关联可靠性的实际问题，为此系统阐述了所提方法面临的概念性与实践性挑战及解决方案，例如数据集复杂性和预测稳定性问题。具体而言，本文的贡献在于：首先构建基于预训练语言模型的神经网络以预测网络安全漏洞间的同源关系，进而阐明如何将该能力应用于层次化攻击模型的生成。此外，提出了两种应对数据复杂性的数据采样机制，以及一种降低误报预测的共识机制。通过三组网络安全数据的实证结果，对这些方法进行了比较与对比，以评估其有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日