Secure by Design has become the mainstream development approach ensuring that software systems are not vulnerable to cyberattacks. Architectural security controls need to be carefully monitored over the software development life cycle to avoid critical design flaws. Unfortunately, functional requirements usually get in the way of the security features, and the development team may not correctly address critical security requirements. Identifying tactic-related code pieces in a software project enables an efficient review of the security controls' implementation as well as a resilient software architecture. This paper enumerates a comprehensive list of commonly used security controls and creates a dataset for each one of them by pulling related and unrelated code snippets from the open API of the StackOverflow question and answer platform. It uses the state-of-the-art NLP technique Bidirectional Encoder Representations from Transformers (BERT) and the Tactic Detector from our prior work to show that code pieces that implement security controls could be identified with high confidence. The results show that our model trained on tactic-related and unrelated code snippets derived from StackOverflow is able to identify tactic-related code pieces with F-Measure values above 0.9.
翻译:安全设计已成为确保软件系统免受网络攻击的主流开发方法。在软件开发生命周期中,需要仔细监控架构安全控制,以避免关键设计缺陷。然而,功能需求通常会干扰安全特性,开发团队可能无法正确解决关键安全需求。识别软件项目中的策略相关代码片段,有助于高效审查安全控制的实现以及构建弹性软件架构。本文列举了常用安全控制的综合列表,并通过从StackOverflow问答平台开放API中提取相关和不相关的代码片段,为每种安全控制创建数据集。本文利用最先进的NLP技术——基于Transformer的双向编码器表示(BERT)以及我们先前工作中的策略检测器,证明实现安全控制的代码片段能够以高置信度被识别。结果表明,基于从StackOverflow获取的策略相关与无关代码片段训练的模型,能够以超过0.9的F-Measure值识别策略相关代码片段。