Comparing ML-Specific and General Python Code Smells Across Project Characteristics

Machine learning systems consist of general-purpose code as well as machine-learning-specific code. While ML-specific code smells have been identified, their connection to project characteristics and their interaction with overall code quality are not well understood. Without this knowledge, quality assurance strategies remain one-size-fits-all, failing to account for the contextual factors that drive technical debt in ML systems. We present empirical evidence by examining how six project features (size, age, contributors, commit frequency, CI/CD adoption, and domain) relate to both ML-specific and general Python code quality in 279 open-source ML projects on GitHub. Using CodeSmile for ML code smells and Pylint for general Python smells, our results show: (1) ML code smells are 41-94 times less frequent than general Python smells; (2) commit frequency and domain are significantly associated with ML-specific quality, while project size, team size, age, and CI/CD adoption are not, challenging traditional views on technical debt; (3) general Python smells are not linked to any project characteristic, indicating systemic coding issues that are independent of project context; (4) domains that suffer most from ML-specific smells are not necessarily the same domains that suffer most from general Python smells, necessitating tailored quality strategies for each smell type. MLOps often involves configuration issues, Reinforcement Learning faces challenges with tensor manipulation, and Computer Vision encounters problems with GPU workflows. Overall, ML code quality depends on domain-specific practices and specialized CI/CD quality gates, as standard automation often overlooks domain-specific correctness problems.

翻译：机器学习系统由通用代码和机器学习专用代码组成。尽管已有研究识别出机器学习专用代码异味，但这些异味与项目特性之间的关系及其对整体代码质量的影响尚未得到充分理解。缺乏这一认知，质量保证策略仍将采用“一刀切”模式，无法适应驱动机器学习系统技术债务的情境因素。我们通过对GitHub上279个开源机器学习项目中六项项目特征（项目规模、项目年龄、贡献者数量、提交频率、CI/CD采用情况以及应用领域）与机器学习专用和通用Python代码质量之间的关系进行实证分析，提供经验性证据。采用CodeSmile检测机器学习代码异味、Pylint检测通用Python代码异味，研究结果表明：（1）机器学习代码异味的出现频率比通用Python代码异味低41至94倍；（2）提交频率和应用领域与机器学习专用代码质量显著相关，而项目规模、团队规模、项目年龄和CI/CD采用情况则无显著关联，这与技术债务的传统观点相悖；（3）通用Python代码异味与任何项目特征均无关联，表明存在不受项目环境影响的系统性编码问题；（4）受机器学习专用代码异味影响最严重的领域，并非必然与受通用Python代码异味影响最严重的领域相同，因此需要针对每种异味类型制定定制化的质量策略。MLOps领域常出现配置问题，强化学习面临张量操作方面的挑战，计算机视觉则遭遇GPU工作流相关难题。总体而言，机器学习代码质量取决于领域特定实践和专门的CI/CD质量闸门，因为标准自动化流程往往忽略了领域特定正确性问题。