Equivalence checking is used to verify whether two programs produce equivalent outputs when given equivalent inputs. Research in this field mainly focused on improving equivalence checking accuracy and runtime performance. However, for program pairs that cannot be proven to be either equivalent or non-equivalent, existing approaches only report a classification result of "unknown", which provides no information regarding the programs' non-/equivalence. In this paper, we introduce PASDA, our partition-based semantic differencing approach with best effort classification of undecided cases. While PASDA aims to formally prove non-/equivalence of analyzed program pairs using a variant of differential symbolic execution, its main novelty lies in its handling of cases for which no formal non-/equivalence proof can be found. For such cases, PASDA provides a best effort equivalence classification based on a set of classification heuristics. We evaluated PASDA with an existing benchmark consisting of 141 non-/equivalent program pairs. PASDA correctly classified 61-74% of these cases at timeouts from 10 seconds to 3600 seconds. Thus, PASDA achieved equivalence checking accuracies that are 3-7% higher than the best results achieved by three existing tools. Furthermore, PASDA's best effort classifications were correct for 70-75% of equivalent and 55-85% of non-equivalent cases across the different timeouts.
翻译:等价性检查用于验证两个程序在给定等价输入时是否产生等价输出。该领域的研究主要聚焦于提升等价性检查的准确性和运行时性能。然而,对于无法确证为等价或不等价的程序对,现有方法仅报告“未知”的分类结果,无法提供关于程序等价/不等价性的任何信息。本文提出PASDA——一种基于划分的语义差异方法,能够对未定情形进行最佳努力分类。PASDA旨在利用微分符号执行的变体形式化证明所分析程序对的等价/不等价性,其核心创新在于处理无法找到形式化等价/不等价证明的情形。针对此类情形,PASDA基于一组分类启发式规则提供最佳努力等价性分类。我们使用包含141个等价/不等价程序对的现有基准对PASDA进行评估。在超时时间从10秒到3600秒的范围内,PASDA正确分类了其中61%-74%的情形。因此,PASDA实现的等价性检查准确率比三种现有工具的最佳结果高出3%-7%。此外,在不同超时设置下,PASDA的最佳努力分类对等价情形的正确率达到70%-75%,对不等价情形的正确率达到55%-85%。