Static analysis is a classical technique for improving software security and software quality in general. Fairly recently, a new static analyzer was implemented in the GNU Compiler Collection (GCC). The present paper uses the GCC's analyzer to empirically examine popular Linux packages. The dataset used is based on those packages in the Gentoo Linux distribution that are either written in C or contain C code. In total, $3,538$ such packages are covered. According to the results, uninitialized variables and NULL pointer dereference issues are the most common problems according to the analyzer. Classical memory management issues are relatively rare. The warnings also follow a long-tailed probability distribution across the packages; a few packages are highly warning-prone, whereas no warnings are present for as much as 89% of the packages. Furthermore, the warnings do not vary across different application domains. With these results, the paper contributes to the domain of large-scale empirical research on software quality and security. In addition, a discussion is presented about practical implications of the results.
翻译:静态分析是一种经典技术,通常用于提升软件安全性与软件质量。近期,GNU 编译器套件(GCC)中实现了一款新的静态分析器。本文利用 GCC 的分析器对流行的 Linux 软件包进行实证研究。所使用的数据集基于 Gentoo Linux 发行版中采用 C 语言编写或包含 C 代码的软件包,共计覆盖 $3,538$ 个此类软件包。根据分析结果,未初始化变量与空指针解引用问题是该分析器检测出的最常见问题。经典的内存管理问题相对较少。警告在软件包间亦呈现长尾概率分布:少数软件包极易产生警告,而多达 89% 的软件包未出现任何警告。此外,警告在不同应用领域间未呈现显著差异。基于这些结果,本文为软件质量与安全的大规模实证研究领域提供了贡献。同时,本文还对研究结果的实际意义进行了讨论。