Large Language Models (LLMs) are being increasingly employed in data science for tasks like data preprocessing and analytics. However, data scientists encounter substantial obstacles when conversing with LLM-powered chatbots and acting on their suggestions and answers. We conducted a mixed-methods study, including contextual observations, semi-structured interviews (n=14), and a survey (n=114), to identify these challenges. Our findings highlight key issues faced by data scientists, including contextual data retrieval, formulating prompts for complex tasks, adapting generated code to local environments, and refining prompts iteratively. Based on these insights, we propose actionable design recommendations, such as data brushing to support context selection, and inquisitive feedback loops to improve communications with AI-based assistants in data-science tools.
翻译:大型语言模型(LLMs)正被日益广泛地应用于数据预处理与分析等数据科学任务中。然而,数据科学家在与基于LLM的聊天机器人进行对话并采纳其建议与答案时,会遇到显著障碍。我们采用混合方法研究,包括情境观察、半结构化访谈(n=14)及问卷调查(n=114),以识别这些挑战。研究结果揭示了数据科学家面临的关键问题,包括上下文数据检索、针对复杂任务制定提示词、将生成代码适配至本地环境,以及迭代优化提示词。基于这些洞察,我们提出了可操作的设计建议,例如支持上下文选择的数据刷选机制,以及通过探究性反馈循环改善数据科学工具中基于AI助手的沟通效果。