Nowadays, machine learning (ML) is being used in software systems with multiple application fields, from medicine to software engineering (SE). On the one hand, the popularity of ML in the industry can be seen in the statistics showing its growth and adoption. On the other hand, its popularity can also be seen in research, particularly in SE, where not only have multiple studies been published in SE conferences and journals but also in the multiple workshops and co-located conferences in software engineering conferences. At the same time, researchers and practitioners have shown that machine learning has some particular challenges and pitfalls. In particular, research has shown that ML-enabled systems have a different development process than traditional SE, which also describes some of the challenges of ML applications. In order to mitigate some of the identified challenges and pitfalls, white and gray literature has proposed a set of recommendations based on their own experiences and focused on their domain (e.g., biomechanics), but for the best of our knowledge, there is no guideline focused on the SE community. This thesis aims to reduce this gap by answering research questions that help to understand the practices used and discussed by practitioners and researchers in the SE community by analyzing possible sources of practices such as question and answer communities and also previous research studies to present a set of practices with an SE perspective.
翻译:如今,机器学习已广泛应用于从医学到软件工程等多个领域的软件系统中。一方面,机器学习在工业界的普及可从其增长与采纳的统计数据中得见;另一方面,其在研究领域(尤其是软件工程)的热度同样显著——不仅大量相关论文发表于软件工程会议与期刊,更在各类软件工程研讨会及联办会议中频繁出现。与此同时,研究者和实践者揭示了机器学习存在特定的挑战与陷阱。研究表明,基于机器学习的系统具有不同于传统软件工程的开发流程,这进一步阐释了机器学习应用面临的若干难题。为缓解已识别的挑战与陷阱,灰白文献基于各自领域(如生物力学)的经验提出了若干建议,但据我们所知,目前尚不存在聚焦于软件工程社区的指导方针。本论文旨在弥合这一空白:通过分析问答社区及既往研究等潜在实践来源,从软件工程视角提炼出一套实践建议,并回答有助于理解软件工程社区中实践者与研究者所用及所讨论实践方法的研究问题。