Jupyter Notebook is a popular tool among data analysts and scientists for working with data. It provides a way to combine code, documentation, and visualizations in a single, interactive environment, facilitating code reuse. While code reuse can improve programming efficiency, it can also decrease readability, security, and overall performance. We conduct a large-scale exploratory study of code reuse practices in the Jupyter Notebook development community on the Stack Overflow platform to understand the potential negative impacts of code reuse. Our findings identified 1,097,470 Jupyter Notebook clone pairs that reuse Stack Overflow code snippets, and the average code snippet has 7.91 code quality violations. Through our research, we gain insight into the reasons behind Jupyter Notebook developers' decision to reuse code and the potential drawbacks of this practice.
翻译:Jupyter Notebook 是数据分析师和科学家处理数据的常用工具。它提供了一种将代码、文档和可视化内容整合在单一交互式环境中的方式,从而促进代码复用。尽管代码复用能够提升编程效率,但也可能降低可读性、安全性及整体性能。本文围绕 StackOverflow 平台上的 Jupyter Notebook 开发社区,开展了一项大规模探索性研究,旨在理解代码复用实践中潜在的不利影响。研究发现,共有 1,097,470 对 Jupyter Notebook 克隆案例复用了 StackOverflow 上的代码片段,且每个代码片段平均存在 7.91 个代码质量违规问题。通过这项研究,我们深入洞察了 Jupyter Notebook 开发者选择代码复用的原因及其潜在弊端。