Understanding Self-Admitted Technical Debt in Test Code: An Empirical Study

Developers often opt for easier but non-optimal implementation to meet deadlines or create rapid prototypes, leading to additional effort known as technical debt to improve the code later. Oftentimes, developers explicitly document the technical debt in code comments, referred to as Self-Admitted Technical Debt (SATD). Numerous researchers have investigated the impact of SATD on different aspects of software quality and development processes. However, most of these studies focus on SATD in production code, often overlooking SATD in the test code or assuming that it shares similar characteristics with SATD in production code. In fact, a significant amount of SATD is also present in the test code, with many instances not fitting into existing categories for the production code. This study aims to fill this gap and disclose the nature of SATD in the test code by examining its distribution and types. Moreover, the relation between its presence and test quality is also analyzed. Our empirical study, involving 17,766 SATD comments (14,987 from production code, 2,779 from test code) collected from 50 repositories, demonstrates that while SATD widely exists in test code, it is not directly associated with test smells. Our study also presents comprehensive categories of SATD types in the test code, and machine learning models are developed to automatically classify SATD comments based on their types for easier management. Our results show that the CodeBERT-based model outperforms other machine learning models in terms of recall and F1-score. However, the performance varies on different types of SATD.

翻译：开发人员常为赶工期或快速创建原型而选择更简单但非最优的实现方案，这导致后续需要额外投入精力改进代码，即所谓技术债务。开发者通常会在代码注释中明确记录此类技术债务，称为自认技术债务（SATD）。众多研究者已从不同角度探究SATD对软件质量与开发流程的影响，但现有研究多聚焦于生产代码中的SATD，往往忽视测试代码中的SATD，或默认其与生产代码中的SATD具有相似特征。实际上，测试代码中同样存在大量SATD，且许多实例无法归入现有生产代码的分类体系。本研究旨在填补这一空白，通过考察测试代码中SATD的分布与类型揭示其本质特征，并分析其存在与测试质量之间的关联。我们对50个代码库中收集的17,766条SATD注释（14,987条来自生产代码，2,779条来自测试代码）展开实证研究，结果表明：虽然SATD在测试代码中广泛存在，但其与测试异味并无直接关联。本研究同时提出了测试代码SATD的完整分类体系，并开发了基于机器学习的自动分类模型以辅助管理。实验显示，基于CodeBERT的模型在召回率与F1分数上优于其他机器学习模型，但其性能在不同SATD类型上存在差异。