We propose establishing an office to oversee AI systems by introducing a tiered system of explainability and benchmarking requirements for commercial AI systems. We examine how complex high-risk technologies have been successfully regulated at the national level. Specifically, we draw parallels to the existing regulation for the U.S. medical device industry and the pharmaceutical industry (regulated by the FDA), the proposed legislation for AI in the European Union (the AI Act), and the existing U.S. anti-discrimination legislation. To promote accountability and user trust, AI accountability mechanisms shall introduce standarized measures for each category of intended high-risk use of AI systems to enable structured comparisons among such AI systems. We suggest using explainable AI techniques, such as input influence measures, as well as fairness statistics and other performance measures of high-risk AI systems. We propose to standardize internal benchmarking and automated audits to transparently characterize high-risk AI systems. The results of such audits and benchmarks shall be clearly and transparently communicated and explained to enable meaningful comparisons of competing AI systems via a public AI registry. Such standardized audits, benchmarks, and certificates shall be specific to intended high-risk use of respective AI systems and could constitute conformity assessment for AI systems, e.g., in the European Union's AI Act.
翻译:我们建议通过引入针对商业人工智能系统的分层可解释性与基准测试要求,设立专门机构来监管人工智能系统。本文研究了复杂高风险技术在国家层面成功监管的案例,具体参照了美国医疗器械行业与制药行业(由FDA监管)的现有法规、欧盟人工智能法案(AI Act)的立法提案以及美国现行的反歧视法律。为促进问责制与用户信任,人工智能问责机制应为每类高风险人工智能应用场景引入标准化评估指标,以实现此类系统间的结构化比较。我们建议采用可解释人工智能技术(如输入影响度量),并结合公平性统计指标及其他高风险人工智能系统的性能评估方法。我们提出标准化内部基准测试与自动化审计流程,以透明化表征高风险人工智能系统。此类审计与基准测试结果应通过公共人工智能注册库进行清晰透明的传达与解释,从而实现竞争性人工智能系统的有效比较。此类标准化审计、基准测试及认证应针对特定高风险应用场景定制,并可构成人工智能系统的符合性评估依据,例如在欧盟人工智能法案框架下的应用。