While results visualization is a critical phase to the communication of new academic results, plots are frequently shared without the complete combination of code, input data, execution context and outputs required to independently reproduce the resulting figures. Existing reproducibility solutions tend to focus on computational pipelines or workflow management systems, not covering script-based visualization practices commonly used by researchers and practitioners. Additionally, the minimalist nature of current Python data visualization libraries tend to speed up the creation of images, disincentivizing users from spending time integrating additional tools into these short scripts. This paper proposes yProv4DV, a library lightweight designed to enable reproducible data visualization scripts through the use of provenance information, minimizing the necessity for code modifications. Through a single call, users can track inputs, outputs and source code files, enabling saving and full reproducibility of their data visualization software. As a result, this library fills a gap in reproducible research workflows by addressing the reproducibility of plots in scientific publications.
翻译:虽然结果可视化是学术成果传播的关键环节,但图表在分享时往往缺少独立复现所需代码、输入数据、执行环境与输出结果的完整组合。现有可复现性解决方案多聚焦于计算流程或工作流管理系统,未能覆盖研究人员和从业者常用的基于脚本的可视化实践。此外,当前Python数据可视化库的极简特性虽加速了图像生成,却削弱了用户在这些简短脚本中集成额外工具的意愿。本文提出轻量级库yProv4DV,通过利用来源信息实现可复现的数据可视化脚本,最大程度减少代码修改需求。用户仅需单次调用即可追踪输入、输出及源代码文件,实现数据可视化软件的保存与完整可复现性。该库通过解决科学出版物中图表复现问题,填补了可复现研究工作流程中的空白。