Predicting the Understandability of Computational Notebooks through Code Metrics Analysis

Computational notebooks have become the primary coding environment for data scientists. However, research on their code quality is still emerging, and the code shared is often of poor quality. Given the importance of maintenance and reusability, understanding the metrics that affect notebook code comprehensibility is crucial. Code understandability, a qualitative variable, is closely tied to user opinions. Traditional approaches to measuring it either use limited questionnaires to review a few code pieces or rely on metadata such as likes and votes in software repositories. Our approach enhances the measurement of Jupyter notebook understandability by leveraging user comments related to code understandability. As a case study, we used 542,051 Kaggle Jupyter notebooks from our previous research, named DistilKaggle. We employed a fine-tuned DistilBERT transformer to identify user comments associated with code understandability. We established a criterion called User Opinion Code Understandability (UOCU), which considers the number of relevant comments, upvotes on those comments, total notebook views, and total notebook upvotes. UOCU proved to be more effective than previous methods. Furthermore, we trained machine learning models to predict notebook code understandability based solely on their metrics. We collected 34 metrics for 132,723 final notebooks as features in our dataset, using UOCU as the label. Our predictive model, using the Random Forest classifier, achieved 89% accuracy in predicting the understandability levels of computational notebooks.

翻译：计算笔记本已成为数据科学家的主要编码环境。然而，关于其代码质量的研究仍处于起步阶段，且共享的代码质量往往较差。考虑到维护和可重用性的重要性，理解影响笔记本代码可理解性的度量指标至关重要。代码可理解性作为一种定性变量，与用户意见密切相关。传统测量方法要么使用有限的问卷调查来审查少量代码片段，要么依赖软件仓库中的点赞和投票等元数据。我们的方法通过利用与代码可理解性相关的用户评论，增强了Jupyter笔记本可理解性的测量。作为案例研究，我们使用了先前研究中名为DistilKaggle的542,051个Kaggle Jupyter笔记本。我们采用微调的DistilBERT Transformer来识别与代码可理解性相关的用户评论。我们建立了一个名为"用户意见代码可理解性"（UOCU）的标准，该标准考虑了相关评论数量、这些评论的点赞数、笔记本总浏览量以及笔记本总点赞数。实践证明，UOCU比以往方法更为有效。此外，我们训练了机器学习模型，仅基于笔记本的度量指标来预测其代码可理解性。我们收集了132,723个最终笔记本的34项度量指标作为数据集特征，并以UOCU作为标签。使用随机森林分类器的预测模型在预测计算笔记本可理解性级别方面达到了89%的准确率。