AI governance efforts increasingly rely on audit standards: agreed-upon practices for conducting audits. However, poorly designed standards can hide and lend credibility to inadequate systems. We explore how an audit standard's design influences its effectiveness through a case study of ASB 018, a standard for auditing probabilistic genotyping software -- software that the U.S. criminal legal system increasingly uses to analyze DNA samples. Through qualitative analysis of ASB 018 and five audit reports, we identify numerous gaps between the standard's desired outcomes and the auditing practices it enables. For instance, ASB 018 envisions that compliant audits establish restrictions on software use based on observed failures. However, audits can comply without establishing such boundaries. We connect these gaps to the design of the standard's requirements such as vague language and undefined terms. We conclude with recommendations for designing audit standards and evaluating their effectiveness.
翻译:人工智能治理工作日益依赖审计标准——即开展审计时商定的实践规范。然而,设计欠妥的标准可能掩盖并赋予不完善系统以可信度。我们以ASB 018标准为案例,探讨审计标准设计如何影响其有效性。ASB 018是用于审计概率基因分型软件的标准,这类软件在美国刑事司法系统中越来越多地被用于DNA样本分析。通过对ASB 018及五份审计报告的定性分析,我们发现了标准预期目标与其所促成的审计实践之间存在诸多差距。例如,ASB 018设想合规审计应根据观测到的故障对软件使用设定限制,但审计可在不设定此类边界的情况下仍满足合规要求。我们将这些差距与标准要求的设计缺陷(如模糊措辞和未定义术语)联系起来。最后,我们提出关于设计审计标准及评估其有效性的建议。