Self-Admitted Technical Debt in LLM Software: An Empirical Comparison with ML and Non-ML Software

Self-admitted technical debt (SATD), referring to comments flagged by developers that explicitly acknowledge suboptimal code or incomplete functionality, has received extensive attention in machine learning (ML) and traditional (Non-ML) software. However, little is known about how SATD manifests and evolves in contemporary Large Language Model (LLM)-based systems, whose architectures, workflows, and dependencies differ fundamentally from both traditional and pre-LLM ML software. In this paper, we conduct the first empirical study of SATD in the LLM era, replicating and extending prior work on ML technical debt to modern LLM-based systems. We compare SATD prevalence across LLM, ML, and non-ML repositories across a total of 477 repositories (159 per category). We perform survival analysis of SATD introduction and removal to understand the dynamics of technical debt across different development paradigms. Surprisingly, despite their architectural complexity, our results reveal that LLM repositories accumulate SATD at similar rates to ML systems (3.95% vs. 4.10%). However, we observe that LLM repositories remain debt-free 2.4x longer than ML repositories (a median of 492 days vs. 204 days), and then start to accumulate technical debt rapidly. Moreover, our qualitative analysis of 377 SATD instances reveals three new forms of technical debt unique to LLM-based development that have not been reported in prior research: Model-Stack Workaround Debt, Model Dependency Debt, and Performance Optimization Debt. Finally, by mapping SATD to stages of the LLM development pipeline, we observe that debt concentrates

翻译：自承认技术债务（SATD）指开发者通过注释明确承认代码欠佳或功能不完整的现象，在机器学习（ML）软件和传统（非ML）软件中已受到广泛关注。然而，对于当前基于大语言模型（LLM）的系统中SATD如何表现与演化，学界仍知之甚少——这类系统的架构、工作流和依赖关系与传统软件及前LLM时代的ML软件存在根本差异。本文首次对LLM时代的SATD展开实证研究，将先前关于ML技术债务的研究复现并扩展至现代基于LLM的系统。我们比较了LLM、ML和非ML三类共477个代码库（每类159个）中SATD的普遍性，并通过SATD引入与消除的生存分析来理解不同开发范式下技术债务的动态变化。令人惊讶的是，尽管LLM系统架构复杂，但其SATD积累速率与ML系统相当（3.95% vs. 4.10%）。然而，我们发现LLM代码库保持无债务状态的时间是ML代码库的2.4倍（中位数492天 vs. 204天），之后才开始快速积累技术债务。此外，通过对377个SATD实例的定性分析，我们揭示了三种先前研究中未报告的、LLM开发特有的技术债务形式：模型堆栈变通债务、模型依赖债务和性能优化债务。最后，通过将SATD映射到LLM开发流程的各阶段，我们观察到债务集中