Names Are All You Need: Effective and Safe Regression Test Selection for Python

Regression test selection reduces the cost of regression testing by executing only those tests affected by a code change. Despite extensive study of RTS in statically typed languages, achieving effective and safe RTS in Python is challenging. Python's dynamic typing makes precise call-graph construction difficult, which can cause call-graph-based RTS to miss affected tests. Python's eager importing mechanism, in contrast, renders file-level dependency analysis overly conservative. This paper presents NameRTS, the first Python RTS approach based on fine-grained dependency analysis. NameRTS models a Python program as a bipartite graph of code element nodes and name nodes, with edges capturing definitions and references. RTS is formulated as a reachability problem on this graph: a test is selected if any modified code element is reachable from the names used in that test. This design avoids call-graph construction, enabling a conservative analysis amenable to safety. To control dependency cascades introduced by coarse name matching, NameRTS applies two pruning strategies that leverage prior test executions and context information to refine name matching. To evaluate NameRTS, we construct the first Python RTS dataset with a ground truth indicating which test files are affected by each commit. We compare NameRTS with the best-performing baseline, BabelRTS, an RTS technique based on coarse file-level dependencies. On this benchmark, NameRTS skips 69.90% of test files on average, outperforming BabelRTS by 146.5%. It also reduces end-to-end testing time by 45.59%, yielding a 107.7% improvement over BabelRTS. In terms of safety, NameRTS selects all affected tests for 99.6% of commits, with only rare misses in exceptional cases. In contrast, BabelRTS is safe for 76.6% of commits. These results demonstrate the effectiveness of NameRTS, paving the way for more efficient regression testing in Python.

翻译：回归测试选择通过仅执行受代码变更影响的测试，降低回归测试成本。尽管静态类型语言中RTS已被广泛研究，但在Python中实现高效且安全的RTS仍具挑战性。Python的动态类型特性导致精确调用图构建困难，这使得基于调用图的RTS可能遗漏受影响的测试。相反，Python的急切导入机制使得基于文件级的依赖分析过于保守。本文提出NameRTS——首个基于细粒度依赖分析的Python RTS方法。NameRTS将Python程序建模为代码元素节点与名称节点构成的二分图，边捕获定义与引用关系。RTS被形式化为该图上的可达性问题：若测试中使用的名称能到达任何被修改的代码元素，则选择该测试。该设计避免了调用图构建，从而通过保守分析保障安全性。为控制粗粒度名称匹配引发的依赖级联效应，NameRTS应用两种剪枝策略，利用历史测试执行信息与上下文信息优化名称匹配。为评估NameRTS，我们构建了首个包含每个提交影响哪些测试文件真实标注的Python RTS数据集。我们将NameRTS与基于粗粒度文件级依赖的RTS技术BabelRTS（当前最优基线）进行对比。在该基准测试中，NameRTS平均跳过69.90%的测试文件，性能超过BabelRTS达146.5%；端到端测试时间减少45.59%，较BabelRTS提升107.7%。安全性方面，NameRTS对99.6%的提交选出了所有受影响的测试，仅在极端情况下出现罕见遗漏，而BabelRTS对76.6%的提交保持安全。这些结果证明了NameRTS的有效性，为Python更高效的回归测试铺平了道路。