Context: On top of the inherent challenges startup software companies face applying proper software engineering practices, the non-deterministic nature of machine learning techniques makes it even more difficult for machine learning (ML) startups. Objective: Therefore, the objective of our study is to understand the whole picture of software engineering practices followed by ML startups and identify additional needs. Method: To achieve our goal, we conducted a systematic literature review study on 37 papers published in the last 21 years. We selected papers on both general software startups and ML startups. We collected data to understand software engineering (SE) practices in five phases of the software development life-cycle: requirement engineering, design, development, quality assurance, and deployment. Results: We find some interesting differences in software engineering practices in ML startups and general software startups. The data management and model learning phases are the most prominent among them. Conclusion: While ML startups face many similar challenges to general software startups, the additional difficulties of using stochastic ML models require different strategies in using software engineering practices to produce high-quality products.
翻译:背景:在初创软件公司面临应用恰当软件工程实践的内在挑战之上,机器学习技术的非确定性特征使得机器学习初创企业面临更大困难。目的:因此,本研究旨在全面理解机器学习初创企业所遵循的软件工程实践,并识别其额外需求。方法:为实现目标,我们对过去21年间发表的37篇论文进行了系统性文献综述,研究对象涵盖通用软件初创企业与机器学习初创企业。我们收集了软件开发生命周期五个阶段(需求工程、设计、开发、质量保证与部署)中软件工程实践的相关数据。结果:我们发现机器学习初创企业与通用软件初创企业在软件工程实践方面存在若干有趣差异,其中数据管理与模型学习阶段最为显著。结论:尽管机器学习初创企业面临与通用软件初创企业相似的诸多挑战,但使用随机机器学习模型带来的额外困难要求企业在运用软件工程实践生产高质量产品时采用不同策略。