AI safety practitioners invest considerable resources in AI system evaluations, but these investments may be wasted if evaluations fail to realize their impact. This paper questions the core value proposition of evaluations: that they significantly improve our understanding of AI risks and, consequently, our ability to mitigate those risks. Evaluations may fail to improve understanding in six ways, such as risks manifesting beyond the AI system or insignificant returns from evaluations compared to real-world observations. Improved understanding may also not lead to better risk mitigation in four ways, including challenges in upholding and enforcing commitments. Evaluations could even be harmful, for example, by triggering the weaponization of dual-use capabilities or invoking high opportunity costs for AI safety. This paper concludes with considerations for improving evaluation practices and 12 recommendations for AI labs, external evaluators, regulators, and academic researchers to encourage a more strategic and impactful approach to AI risk assessment and mitigation.
翻译:AI安全从业者投入大量资源进行AI系统评估,但这些投入若未能实现其预期影响,则可能被浪费。本文质疑评估的核心价值主张:即它们能显著提升我们对AI风险的理解,并进而增强我们缓解这些风险的能力。评估可能在六个方面未能增进理解,例如风险在AI系统之外显现,或评估相较于现实世界观察的回报微不足道。增进的理解也可能在四个方面无法带来更好的风险缓解,包括维持和执行承诺方面的挑战。评估甚至可能产生危害,例如触发双重用途能力的武器化,或引发AI安全领域的高机会成本。本文最后提出了改进评估实践的思考,并为AI实验室、外部评估者、监管机构和学术研究者提供了12条建议,以鼓励采取更具战略性和影响力的AI风险评估与缓解方法。