Dataflow diagrams (DFDs) are a valuable asset for securing applications, as they are the starting point for many security assessment techniques. Their creation, however, is often done manually, which is time-consuming and introduces problems concerning their correctness. Furthermore, as applications are continuously extended and modified in CI/CD pipelines, the DFDs need to be kept in sync, which is also challenging. In this paper, we present a novel, tool-supported technique to automatically extract DFDs from the implementation code of microservices. The technique parses source code and configuration files in search for keywords that are used as evidence for the model extraction. Our approach uses a novel technique that iteratively detects new keywords, thereby snowballing through an application's codebase. Coupled with other detection techniques, it produces a fully-fledged DFD enriched with security-relevant annotations. The extracted DFDs further provide full traceability between model items and code snippets. We evaluate our approach and the accompanying prototype for applications written in Java on a manually curated dataset of 17 open-source applications. In our testing set of applications, we observe an overall precision of 93% and recall of 85%.
翻译:数据流图是保障应用安全的重要资产,因其是众多安全评估技术的起点。然而,其创建过程通常需要人工完成,这不仅耗时且会引入正确性问题。此外,当应用在CI/CD流水线中被持续扩展和修改时,保持数据流图的同步更新同样具有挑战性。本文提出了一种新颖的、工具支撑的技术,能够从微服务的实现代码中自动提取数据流图。该技术通过解析源代码和配置文件,搜索用作模型提取依据的关键词。我们的方法采用了一种创新技术——通过迭代检测新关键词,在应用代码库中形成"滚雪球式"的挖掘。结合其他检测技术,该方法最终生成一个包含安全相关注释的完整数据流图。所提取的数据流图进一步提供了模型项与代码片段间的完全可追溯性。我们针对Java编写的应用,在手工整理的17个开源应用数据集上评估了该技术及其原型工具。在测试应用集中,我们观察到整体精确率为93%,召回率为85%。