Python developers rely on two major testing frameworks: \texttt{unittest} and \texttt{Pytest}. While \texttt{Pytest} offers simpler assertions, reusable fixtures, and better interoperability, migrating existing suites from \texttt{unittest} remains a manual and time-consuming process. Automating this migration could substantially reduce effort and accelerate test modernization. In this paper, we investigate the capability of Large Language Models (LLMs) to automate test framework migrations from \texttt{unittest} to \texttt{Pytest}. We evaluate GPT 4o and Claude Sonnet 4 under three prompting strategies (Zero-shot, One-shot, and Chain-of-Thought) and two temperature settings (0.0 and 1.0). To support this analysis, we first introduce a curated dataset of real-world migrations extracted from the top 100 Python open-source projects. Next, we actually execute the LLM-generated test migrations in their respective test suites. Overall, we find that 51.5% of the LLM-generated test migrations failed, while 48.5% passed. The results suggest that LLMs can accelerate test migration, but there are often caveats. For example, Claude Sonnet 4 exhibited more conservative migrations (e.g., preserving class-based tests and legacy \texttt{unittest} references), while GPT-4o favored more transformations (e.g., to function-based tests). We conclude by discussing multiple implications for practitioners and researchers.
翻译:Python开发者主要依赖两大测试框架:\texttt{unittest} 和 \texttt{Pytest}。虽然 \texttt{Pytest} 提供了更简洁的断言、可复用的夹具以及更好的互操作性,但将现有测试套件从 \texttt{unittest} 迁移过来仍然是一个手动且耗时的过程。自动化此迁移过程可以显著减少工作量并加速测试现代化。本文研究了大型语言模型(LLMs)在自动化从 \texttt{unittest} 到 \texttt{Pytest} 的测试框架迁移方面的能力。我们评估了GPT-4o和Claude Sonnet 4在三种提示策略(零样本、单样本和思维链)和两种温度设置(0.0和1.0)下的表现。为支持此分析,我们首先引入了一个从排名前100的Python开源项目中提取的真实迁移案例数据集。接着,我们在相应的测试套件中实际执行了LLM生成的测试迁移。总体而言,我们发现LLM生成的测试迁移中有51.5%失败,而48.5%成功。结果表明,LLMs可以加速测试迁移,但通常存在注意事项。例如,Claude Sonnet 4表现出更保守的迁移(例如,保留基于类的测试和遗留的 \texttt{unittest} 引用),而GPT-4o则倾向于进行更多转换(例如,转向基于函数的测试)。最后,我们讨论了该研究对从业者和研究者的多重启示。