Jupyter notebooks are widely used for machine learning (ML) prototyping. Yet few debugging tools are designed for ML code in notebooks, potentially due to the lack of benchmarks. We introduce JunoBench, the first benchmark dataset of real-world crashes in Python-based ML notebooks. JunoBench has 111 curated and reproducible crashes from public Kaggle notebooks, each paired with a verifiable fix, ranging over popular ML libraries, including TensorFlow/Keras, PyTorch, Scikit-learn, Pandas, and NumPy, as well as notebook-specific out-of-order execution issue. To support reproducibility and ease of use, JunoBench offers a unified execution environment where crashes and fixes can be reliably reproduced. By providing realistic crashes and their resolutions, JunoBench facilitates bug detection, localization, and repair tailored to the interactive and iterative nature of notebook-based ML development.
翻译:Jupyter Notebook在机器学习原型开发中被广泛使用。然而,由于缺乏基准数据集,目前鲜有针对Notebook中ML代码设计的调试工具。本文提出JunoBench——首个基于Python的ML Notebook真实崩溃场景基准数据集。JunoBench包含从公开Kaggle Notebook中收集的111个可复现崩溃案例,每个案例均配有可验证的修复方案,涵盖TensorFlow/Keras、PyTorch、Scikit-learn、Pandas、NumPy等主流ML库以及Notebook特有的乱序执行问题。为支持复现性与易用性,JunoBench提供统一的执行环境,确保崩溃与修复方案可稳定复现。通过提供真实崩溃案例及其解决方案,JunoBench为适应Notebook交互式、迭代式ML开发特性的缺陷检测、定位与修复研究提供了支持。