This study explores the intricate relationship between sentiment analysis (SA) and code quality within machine learning (ML) projects, illustrating how the emotional dynamics of developers affect the technical and functional attributes of software projects. Recognizing the vital role of developer sentiments, this research employs advanced sentiment analysis techniques to scrutinize affective states from textual interactions such as code comments, commit messages, and issue discussions within high-profile ML projects. By integrating a comprehensive dataset of popular ML repositories, this analysis applies a blend of rule-based, machine learning, and hybrid sentiment analysis methodologies to systematically quantify sentiment scores. The emotional valence expressed by developers is then correlated with a spectrum of code quality indicators, including the prevalence of bugs, vulnerabilities, security hotspots, code smells, and duplication instances. Findings from this study distinctly illustrate that positive sentiments among developers are strongly associated with superior code quality metrics manifested through reduced bugs and lower incidence of code smells. This relationship underscores the importance of fostering positive emotional environments to enhance productivity and code craftsmanship. Conversely, the analysis reveals that negative sentiments correlate with an uptick in code issues, particularly increased duplication and heightened security risks, pointing to the detrimental effects of adverse emotional conditions on project health.
翻译:本研究探讨了情感分析(SA)与机器学习(ML)项目中代码质量之间的复杂关系,阐明了开发者的情感动态如何影响软件项目的技术和功能属性。认识到开发者情感的关键作用,本研究采用先进的情感分析技术,对知名ML项目中代码注释、提交信息和问题讨论等文本交互所反映的情感状态进行深入分析。通过整合流行ML代码库的全面数据集,本研究综合运用基于规则、机器学习和混合情感分析方法,系统量化情感得分。随后,将开发者表达的情感效价与一系列代码质量指标进行关联分析,包括缺陷发生率、漏洞数量、安全热点、代码异味和重复代码实例。研究结果明确显示,开发者的积极情感与更优的代码质量指标显著相关,具体表现为缺陷减少和代码异味发生率降低。这一关系强调了营造积极情感环境对于提升生产力和代码工艺的重要性。相反,分析表明负面情感与代码问题的增加相关,特别是重复代码增多和安全风险升高,揭示了不良情感状况对项目健康的有害影响。