Towards Effective Collaboration between Software Engineers and Data Scientists developing Machine Learning-Enabled Systems

Incorporating Machine Learning (ML) into existing systems is a demand that has grown among several organizations. However, the development of ML-enabled systems encompasses several social and technical challenges, which must be addressed by actors with different fields of expertise working together. This paper has the objective of understanding how to enhance the collaboration between two key actors in building these systems: software engineers and data scientists. We conducted two focus group sessions with experienced data scientists and software engineers working on real-world ML-enabled systems to assess the relevance of different recommendations for specific technical tasks. Our research has found that collaboration between these actors is important for effectively developing ML-enabled systems, especially when defining data access and ML model deployment. Participants provided concrete examples of how recommendations depicted in the literature can benefit collaboration during different tasks. For example, defining clear responsibilities for each team member and creating concise documentation can improve communication and overall performance. Our study contributes to a better understanding of how to foster effective collaboration between software engineers and data scientists creating ML-enabled systems.

翻译：将机器学习（ML）融入现有系统已成为众多组织的迫切需求。然而，机器学习赋能系统的开发涉及诸多社会与技术层面的挑战，需要具备不同专业背景的参与者协同应对。本文旨在探究如何加强构建此类系统的两个关键角色——软件工程师与数据科学家之间的协作。我们组织了两场焦点小组会议，邀请具有实际机器学习赋能系统开发经验的资深数据科学家和软件工程师参与，评估针对特定技术任务的不同建议的适用性。研究发现，这些角色之间的协作对于有效开发机器学习赋能系统至关重要，尤其在定义数据访问和机器学习模型部署阶段。与会者通过具体案例说明了文献中提出的建议如何在各项任务中促进协作。例如，明确界定各团队成员的责任并创建简洁的文档，能够有效改善沟通并提升整体效能。本研究为深入理解如何促进软件工程师与数据科学家在创建机器学习赋能系统过程中的有效协作提供了新的见解。

相关内容

Engineering

关注 7

《工程》是中国工程院（CAE）于2015年推出的国际开放存取期刊。其目的是提供一个高水平的平台，传播和分享工程研发的前沿进展、当前主要研究成果和关键成果；报告工程科学的进展，讨论工程发展的热点、兴趣领域、挑战和前景，在工程中考虑人与环境的福祉和伦理道德，鼓励具有深远经济和社会意义的工程突破和创新，使之达到国际先进水平，成为新的生产力，从而改变世界，造福人类，创造新的未来。期刊链接：https://www.sciencedirect.com/journal/engineering

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日