Machine Learning (ML) models are widely used across various domains, including medical diagnostics and autonomous driving. To support this growth, cloud providers offer ML services to ease the integration of ML components in software systems. The evolving business requirements and the popularity of ML services have led practitioners of all skill levels to implement, and maintain ML service-based systems. However, they may not always adhere to optimal design and usage practices for ML cloud services, resulting in common misuse which could significantly degrade the quality of ML service-based systems and adversely affect their maintenance and evolution. Though much research has been conducted on ML service misuse, a consistent terminology and specification for these misuses remain absent. We therefore conduct in this paper a comprehensive, multi-vocal empirical study exploring the prevalence of ML cloud service misuses in practice. We propose a catalog of 20 ML cloud service misuses, most of which have not been studied in prior research. To achieve this, we conducted a) a systematic literature review of studies on ML misuses, b) a gray literature review of the official documentation provided by major cloud providers, c) an empirical analysis of a curated set of 377 ML service-based systems on GitHub, and d) a survey with 50 ML practitioners. Our results show that ML service misuses are common in both open-source projects and industry, often stemming from a lack of understanding of service capabilities, and insufficient documentation. This emphasizes the importance of ongoing education in best practices for ML services, which is the focus of this paper, while also highlighting the need for tools to automatically detect and refactor ML misuses.
翻译:机器学习(ML)模型广泛应用于医疗诊断和自动驾驶等多个领域。为支持这一发展,云服务提供商提供了ML服务,以简化ML组件在软件系统中的集成。不断变化的业务需求及ML服务的普及,使得不同技能水平的从业者都在实施和维护基于ML服务的系统。然而,他们可能并不总是遵循ML云服务的最佳设计和使用实践,从而导致常见误用,这可能会显著降低基于ML服务的系统质量,并对其维护和演进产生不利影响。尽管已有大量关于ML服务误用的研究,但目前仍缺乏对这些误用的统一术语和规范。因此,本文开展了一项全面、多源的实证研究,探讨实践中ML云服务误用的普遍性。我们提出了一个包含20种ML云服务误用的分类目录,其中大多数在先前研究中未被探讨。为实现这一目标,我们进行了:a) 对ML误用研究的系统性文献综述,b) 对主要云服务提供商官方文档的灰色文献综述,c) 对GitHub上精选的377个基于ML服务的系统进行实证分析,以及d) 对50名ML从业者的调查。我们的结果表明,ML服务误用在开源项目和工业实践中均很常见,通常源于对服务功能理解不足以及文档不充分。这强调了持续进行ML服务最佳实践教育的重要性(这也是本文的重点),同时凸显了开发工具以自动检测和重构ML误用的必要性。