There is a gap between how people explore data and how Jupyter-like computational notebooks are designed. People explore data nonlinearly, using execution undos, branching, and/or complete reverts, whereas notebooks are designed for sequential exploration. Recent works like ForkIt are still insufficient to support these multiple modes of nonlinear exploration in a unified way. In this work, we address the challenge by introducing two-dimensional code+data space versioning for computational notebooks and verifying its effectiveness using our prototype system, Kishuboard, which integrates with Jupyter. By adjusting code and data knobs, users of Kishuboard can intuitively manage the state of computational notebooks in a flexible way, thereby achieving both execution rollbacks and checkouts across complex multi-branch exploration history. Moreover, this two-dimensional versioning mechanism can easily be presented along with a friendly one-dimensional history. Human subject studies indicate that Kishuboard significantly enhances user productivity in various data science tasks.
翻译:当前人们探索数据的方式与类Jupyter计算笔记本的设计理念存在脱节。人们以非线性方式探索数据,常采用执行撤销、分支探索或完全回滚等操作,而笔记本系统主要面向顺序探索设计。近期如ForkIt等工作仍无法以统一方式充分支持这些多元非线性探索模式。本研究通过为计算笔记本引入二维代码+数据空间版本控制机制应对这一挑战,并借助与Jupyter集成的原型系统Kishuboard验证其有效性。通过调节代码与数据控制旋钮,Kishuboard用户能以直观灵活的方式管理计算笔记本状态,从而在复杂的多分支探索历史中实现执行回滚与版本检出。此外,该二维版本控制机制可适配呈现为友好的一维历史视图。人因实验表明,Kishuboard能显著提升用户在各类数据科学任务中的工作效率。