Understanding how data quality aligns with regulatory requirements in machine learning (ML) systems presents a critical challenge for practitioners navigating the evolving EU regulatory landscape. To address this, we first propose a practical framework aligning established data quality dimensions with specific EU regulatory requirements. Second, we conducted a comprehensive online survey with over 180 EU-based data practitioners, investigating their approaches, key challenges, and unmet needs when ensuring data quality in ML systems that align with regulatory requirements. Our findings highlight crucial gaps between current practices and regulatory expectations, underscoring practitioners' need for more integrated data quality tools and better collaboration between technical and legal practitioners. These insights inform recommendations for bridging technical expertise and regulatory compliance, ultimately fostering responsible and trustworthy ML deployments.
翻译:理解数据质量如何与机器学习(ML)系统中的监管要求保持一致,是当前从业者在不断演变的欧盟监管环境下所面临的关键挑战。为此,我们首先提出了一个实用框架,将既有的数据质量维度与具体的欧盟监管要求进行对齐。其次,我们对超过180名位于欧盟的数据从业者开展了一项全面的在线调查,探究他们在确保符合监管要求的机器学习系统数据质量时所采用的方法、面临的主要挑战以及尚未满足的需求。我们的研究结果揭示了当前实践与监管预期之间的关键差距,强调了从业者对更集成化的数据质量工具以及技术从业者与法律从业者之间更好协作的迫切需求。这些见解为弥合技术专长与监管合规之间的鸿沟提供了建议,最终促进负责任且可信赖的机器学习部署。