Domain experts can play a crucial role in guiding data scientists to optimize machine learning models while ensuring contextual relevance for downstream use. However, in current workflows, such collaboration is challenging due to differing expertise, abstract documentation practices, and lack of access and visibility into low-level implementation artifacts. To address these challenges and enable domain expert participation, we introduce CellSync, a collaboration framework comprising (1) a Jupyter Notebook extension that continuously tracks changes to dataframes and model metrics and (2) a Large Language Model powered visualization dashboard that makes those changes interpretable to domain experts. Through CellSync's cell-level dataset visualization with code summaries, domain experts can interactively examine how individual data and modeling operations impact different data segments. The chat features enable data-centric conversations and targeted feedback to data scientists. Our preliminary evaluation shows that CellSync provides transparency and promotes critical discussions about the intents and implications of data operations.
翻译:领域专家在指导数据科学家优化机器学习模型、确保模型在下游应用中保持上下文相关性方面可发挥关键作用。然而,在当前工作流中,由于专业背景差异、文档实践抽象化以及缺乏对底层实现细节的访问权限和可见性,此类协作面临诸多挑战。为应对这些挑战并促进领域专家参与,我们提出CellSync协作框架,该框架包含:(1) 一个Jupyter Notebook扩展,用于持续追踪数据框和模型指标的变化;(2) 基于大语言模型的可视化仪表板,使这些变化对领域专家而言可解释。通过CellSync的单元格级数据集可视化与代码摘要功能,领域专家能够交互式地检查数据与建模操作对各个数据片段的影响。聊天功能支持围绕数据为中心的对话,并向数据科学家提供精准反馈。初步评估表明,CellSync提升了透明度,并促进了关于数据操作意图与影响的关键讨论。