Traditional equivalence checking classifies programs as equivalent or non-equivalent, providing insufficient information for tasks like patch impact analysis where it is expected the patched version of the program to be non-equivalent to the original program. When two program versions are non-equivalent, determining under what conditions they differ and what percentage of inputs are affected remains an open challenge. In this work, we introduce quantitative partial equivalence analysis, an approach for assessing software patches by quantifying behavioral differences between the original (vulnerable) code and the patched code. Using symbolic analysis, we identify input conditions under which patched and original programs exhibit identical or divergent behaviors. Our approach refines non-equivalence by measuring the extent of behavioral divergence across the input domain. For efficient quantitative analysis of numerical domains, we propose a range-based search heuristic that provides a sound lower bound on equivalence. We demonstrate our approach on 90 CVE patches from widely used open-source projects (Linux, Qemu, FFmpeg), as well as on a Juliet Test Suite-based dataset containing programs with CWEs. Our results show that quantitative partial equivalence analysis effectively characterizes and quantifies patch impact. Additionally, experiments on the EqBench benchmark reveal five C program pairs that are mislabeled as equivalent, and we identify the input conditions under which their behaviors diverge.
翻译:传统的等价性检查将程序分类为等价或不等价,但在补丁影响分析等任务中,通常预期补丁版本与原程序不等价,因此传统方法提供的信息不足。当两个程序版本不等价时,确定它们在何种条件下存在差异以及受影响的输入占比仍是一个开放性挑战。本研究提出定量部分等价分析方法,通过量化原始(脆弱)代码与补丁代码之间的行为差异来评估软件补丁。利用符号分析,我们识别补丁程序与原程序在输入条件下表现出相同或相异行为的边界。该方法通过衡量输入域上行为差异的程度来细化不等价性。为实现数值域的高效定量分析,我们提出一种基于范围的搜索启发式策略,该策略能提供可靠的等价下界。我们在广泛使用的开源项目(Linux、Qemu、FFmpeg)的90个CVE补丁以及基于Juliet测试套件的含CWE漏洞程序数据集上验证了该方法。结果表明,定量部分等价分析能有效表征并量化补丁影响。此外,针对EqBench基准的实验揭示了五组C语言程序对被错误标记为等价,我们进一步确定了导致其行为差异的输入条件。