Is your AI Model Accurate Enough? The Difficult Choices Behind Rigorous AI Development and the EU AI Act

Technical and legal debates frequently suggest that "accuracy" is an objective, measurable, and purely technical property. We challenge this view, showing that evaluating AI performance fundamentally depends on context-dependent normative decisions. These techno-normative choices are crucial for rigorous AI deployment, as they determine which errors are prioritised, how risks are distributed, and how trade-offs between competing objectives are resolved. This paper provides a legal-technical analysis of the choices that shape how accuracy is defined, measured, and assessed, using the 2024 European Union AI Act -- which mandates an "appropriate level of accuracy" for high-risk systems -- as a primary case study. We identify and analyse four choices central to any robust performance evaluation: (1) selecting metrics, (2) balancing multiple metrics, (3) measuring metrics against representative data, and (4) determining acceptance thresholds. For each choice, we study its relationship to the AI Act's accuracy requirement and associated documentation obligations, show how its technical implementation embeds implicit or explicit assumptions about acceptable risks, errors, and trade-offs, and discuss the implications for the practical implementation of the AI Act by examples and related technical standards. By making the techno-normative dimensions of accuracy explicit, this paper contributes to broader interdisciplinary debates on AI governance and regulation, and offers specific guidance for regulators, auditors, and developers tasked with translating (legal) safety requirements into technical practice.

翻译：技术性和法律性讨论中，"准确性"常被视为一种客观、可度量且纯粹的技术属性。我们对此提出质疑，论证评估AI性能从根本上依赖于情境相关的规范性决策。这些技术-规范性选择对于严格部署AI至关重要，因为它们决定了哪些错误被优先处理、风险如何分配，以及竞争性目标之间的权衡如何解决。本文以2024年《欧盟人工智能法案》（要求高风险系统达到"适当准确性水平"）为主要案例，对塑造准确性定义、度量和评估方式的选择进行法律-技术分析。我们识别并分析了任何稳健性能评估中至关重要的四种选择：（1）指标选择，（2）多指标权衡，（3）针对代表性数据的指标测量，（4）接受阈值确定。针对每种选择，我们研究其与《人工智能法案》准确性要求及相关文档义务的关系，展示其技术实现如何嵌入关于可接受风险、错误和权衡的隐性或显性假设，并通过示例及相关技术标准讨论对《人工智能法案》实际实施的影响。通过阐明准确性的技术-规范性维度，本文为AI治理与监管的更广泛跨学科辩论做出贡献，并为负责将（法律）安全要求转化为技术实践的监管者、审计师和开发者提供具体指导。