Quality Assurance for Artificial Intelligence: A Study of Industrial Concerns, Challenges and Best Practices

Quality Assurance (QA) aims to prevent mistakes and defects in manufactured products and avoid problems when delivering products or services to customers. QA for AI systems, however, poses particular challenges, given their data-driven and non-deterministic nature as well as more complex architectures and algorithms. While there is growing empirical evidence about practices of machine learning in industrial contexts, little is known about the challenges and best practices of quality assurance for AI systems (QA4AI). In this paper, we report on a mixed-method study of QA4AI in industry practice from various countries and companies. Through interviews with fifteen industry practitioners and a validation survey with 50 practitioner responses, we studied the concerns as well as challenges and best practices in ensuring the QA4AI properties reported in the literature, such as correctness, fairness, interpretability and others. Our findings suggest correctness as the most important property, followed by model relevance, efficiency and deployability. In contrast, transferability (applying knowledge learned in one task to another task), security and fairness are not paid much attention by practitioners compared to other properties. Challenges and solutions are identified for each QA4AI property. For example, interviewees highlighted the trade-off challenge among latency, cost and accuracy for efficiency (latency and cost are parts of efficiency concern). Solutions like model compression are proposed. We identified 21 QA4AI practices across each stage of AI development, with 10 practices being well recognized and another 8 practices being marginally agreed by the survey practitioners.

翻译：质量保证（QA）旨在预防制成品中的错误和缺陷，避免在向客户交付产品或服务时出现问题。然而，鉴于人工智能系统具有数据驱动和非确定性的本质，以及更复杂的架构和算法，其质量保证面临特殊挑战。尽管学界对机器学习在工业环境中的实践积累了越来越多的实证证据，但人工智能系统质量保证（QA4AI）的挑战与最佳实践仍鲜为人知。本文针对来自不同国家和公司的工业实践中的QA4AI开展混合方法研究。通过对15位行业从业者的访谈和50位从业者的验证性调查，我们研究了文献中报道的QA4AI属性（如正确性、公平性、可解释性等）的关切点、挑战及最佳实践。研究发现正确性是最重要的属性，其次是模型相关性、效率和可部署性。相比之下，可迁移性（将一项任务中学到的知识应用于另一项任务）、安全性和公平性等属性受到的关注度较低。我们针对每个QA4AI属性识别了相应的挑战与解决方案。例如，受访者强调了效率方面延迟、成本与准确性之间的权衡挑战（延迟与成本属于效率关切的一部分），并提出了模型压缩等解决方案。我们识别了贯穿人工智能开发各阶段的21种QA4AI实践，其中10种实践获得广泛认可，另有8种实践在调查中获得了从业者的边际认同。