Mutation-based Evaluation of Cryptographic API Misuse Detectors

from arxiv, Accepted at the journal ACM TOPS 2026, extends the IEEE S&P 2022 conference paper by updated evaluation, extended study and extended taxonomy( conference version: arXiv:2107.07065 )

The correct use of cryptography is central to ensuring data security in modern software systems. Hence, several academic and commercial static analysis tools have been developed for detecting and mitigating crypto-API misuse. While developers are optimistically adopting these crypto-API misuse detectors (or crypto-detectors) in their software development cycles, this momentum must be accompanied by a rigorous understanding of their effectiveness at finding crypto-API misuse in practice. This paper describes the MASC framework, which enables a systematic and data-driven evaluation of crypto-detectors using mutation testing. We ground MASC in a comprehensive view of the problem space by developing a data-driven taxonomy of existing crypto-API misuse, containing 107 misuse cases organized among nine semantic clusters. We develop 19 generalizable usage-based mutation operators and three mutation scopes that can expressively instantiate thousands of compilable variants of the misuse cases for thoroughly evaluating crypto-detectors. Using MASC, in a previous study, we evaluated nine major crypto-detectors and discovered 19 unique, undocumented flaws that severely impact the ability of crypto-detectors to discover misuses in practice. This paper substantially extends our MASC framework and offers updated evaluation of the crypto-detectors in our 2022 study, in addition to 5 more, major crypto-detectors. Through this work, we find 6 new, undocumented flaws, and demonstrate that these flaws affect the crypto-detectors regardless of their origin; open-source community, industry, and/or research. We conclude with a discussion on the diverse perspectives that influence the design of crypto-detectors and future directions towards building security-focused crypto-detectors by design.

翻译：在现代软件系统中，密码学的正确使用对于确保数据安全至关重要。因此，学术界和工业界已开发出多种静态分析工具，用于检测和缓解加密API误用问题。尽管开发者正积极地将这些加密API误用检测器（或称加密检测器）集成到软件开发周期中，但这一趋势必须建立在对其实践中检测加密API误用有效性的严谨理解之上。本文介绍了MASC框架，该框架利用突变测试实现了对加密检测器的系统化、数据驱动的评估。我们通过构建一个数据驱动的现有加密API误用分类法，为MASC奠定了问题空间的全面基础，该分类法包含107个误用案例，分布于九个语义簇中。我们开发了19个可泛化的基于用法的突变算子及三种突变作用域，能够富有表现力地实例化数千个可编译的误用案例变体，从而对加密检测器进行全面评估。在先前的一项研究中，我们使用MASC评估了九个主流加密检测器，发现了19个独特的、未记录的缺陷，这些缺陷严重影响了加密检测器在实践中发现误用的能力。本文大幅扩展了我们的MASC框架，并对我们2022年研究中的加密检测器以及另外5个主流加密检测器进行了更新评估。通过这项工作，我们发现了6个新的、未记录的缺陷，并证明这些缺陷影响着所有来源的加密检测器，无论其来自开源社区、工业界还是学术界。最后，我们讨论了影响加密检测器设计的多种视角，以及未来构建以安全为设计核心的加密检测器的方向。