Python is one of the most popular programming languages; as such, projects written in Python involve an increasing number of diverse security vulnerabilities. However, existing state-of-the-art analysis tools for Python only support a few vulnerability types. Hence, there is a need to detect a large variety of vulnerabilities in Python projects. In this paper, we propose the SAGA approach to detect and locate vulnerabilities in Python source code in a versatile way. SAGA includes a source code parser able to extract control- and data-flow information and to represent it as a symbolic control-flow graph, as well as a domain-specific language defining static aspects of the source code and their evolution during graph traversals. We have leveraged this language to define a library of static aspects for integrity, confidentiality, and other security-related properties. We have evaluated SAGA on a dataset of 108 vulnerabilities, obtaining 100% sensitivity and 99.15% specificity, with only one false positive, while outperforming four common security analysis tools. This analysis was performed in less than 31 seconds, i.e., between 2.5 and 512.1 times faster than the baseline tools.
翻译:Python是最流行的编程语言之一;因此,用Python编写的项目涉及越来越多不同类型的安全漏洞。然而,现有最先进的Python分析工具仅支持少数几种漏洞类型。因此,需要一种能够检测Python项目中多种漏洞的方法。本文提出SAGA方法,以多功能的方式检测并定位Python源代码中的漏洞。SAGA包含一个能够提取控制流和数据流信息、并将其表示为符号控制流图的源代码解析器,以及一种定义源代码静态切面及其在图遍历过程中演变的领域特定语言。我们利用该语言构建了一个涵盖完整性、保密性及其他安全相关属性的静态切面库。我们在包含108个漏洞的数据集上评估了SAGA,获得了100%的敏感度和99.15%的特异度,仅出现一例误报,同时性能优于四种常见的安全分析工具。整个分析过程耗时不足31秒,比基准工具快2.5至512.1倍。