Context: Jupyter Notebook has emerged as a versatile tool that transforms how researchers, developers, and data scientists conduct and communicate their work. As the adoption of Jupyter notebooks continues to rise, so does the interest from the software engineering research community in improving the software engineering practices for Jupyter notebooks. Objective: The purpose of this study is to analyze trends, gaps, and methodologies used in software engineering research on Jupyter notebooks. Method: We selected 146 relevant publications from the DBLP Computer Science Bibliography up to the end of 2024, following established systematic literature review guidelines. We explored publication trends, categorized them based on software engineering topics, and reported findings based on those topics. Results: The most popular venues for publishing software engineering research on Jupyter notebooks are related to human-computer interaction instead of traditional software engineering venues. Researchers have addressed a wide range of software engineering topics on notebooks, such as code reuse, readability, and execution environment. Although reusability is one of the research topics for Jupyter notebooks, only 64 of the 146 studies can be reused based on their provided URLs. Additionally, most replication packages are not hosted on permanent repositories for long-term availability and adherence to open science principles. Conclusion: Solutions specific to notebooks for software engineering issues, including testing, refactoring, and documentation, are underexplored. Future research opportunities exist in automatic testing frameworks, refactoring clones between notebooks, and generating group documentation for coherent code cells.
翻译:背景:Jupyter Notebook已成为一种多功能工具,改变了研究人员、开发者和数据科学家开展与交流工作的方式。随着Jupyter Notebook采用率的持续上升,软件工程研究界对改进其软件工程实践的兴趣也日益增长。目标:本研究旨在分析Jupyter Notebook软件工程研究的趋势、空白点及采用的方法论。方法:我们依据系统性文献综述规范,从DBLP计算机科学书目数据库中筛选出截至2024年末的146篇相关文献。通过分析出版趋势,按软件工程主题对文献进行分类,并基于主题报告研究发现。结果:Jupyter Notebook软件工程研究最主流的发表平台多为人机交互相关会议,而非传统软件工程会议。研究者已针对Notebook探讨了代码复用、可读性、执行环境等广泛的软件工程主题。尽管可复用性是Jupyter Notebook的研究主题之一,但146篇研究中仅64篇能通过其提供的URL实现实际复用。此外,多数复现资源包未托管在永久存储库中,难以保障长期可获取性及遵循开放科学原则。结论:针对Notebook的软件工程问题(包括测试、重构和文档化)的专用解决方案尚未得到充分探索。未来研究机遇存在于自动化测试框架、Notebook间克隆代码的重构,以及为连贯代码单元生成群组文档等方面。