The performance of AI models on safety benchmarks does not indicate their real-world performance after deployment. This opaqueness of AI models impedes existing regulatory frameworks constituted on benchmark performance, leaving them incapable of mitigating ongoing real-world harm. The problem stems from a fundamental challenge in AI interpretability, which seems to be overlooked by regulators and decision makers. We propose a simple, realistic and readily usable regulatory framework which does not rely on benchmarks, and call for interdisciplinary collaboration to find new ways to address this crucial problem.
翻译:人工智能模型在安全基准测试上的表现并不能反映其部署后的实际性能。这种模型的不透明性阻碍了基于基准测试性能构建的现有监管框架,使其无法缓解持续发生的现实危害。该问题源于人工智能可解释性中的一个根本性挑战,而监管者和决策者似乎忽视了这一点。我们提出一个简单、现实且易于使用的监管框架,该框架不依赖基准测试,并呼吁跨学科合作以寻找解决这一关键问题的新途径。