As AI systems become more advanced, companies and regulators will make difficult decisions about whether it is safe to train and deploy them. To prepare for these decisions, we investigate how developers could make a 'safety case,' which is a structured rationale that AI systems are unlikely to cause a catastrophe. We propose a framework for organizing a safety case and discuss four categories of arguments to justify safety: total inability to cause a catastrophe, sufficiently strong control measures, trustworthiness despite capability to cause harm, and -- if AI systems become much more powerful -- deference to credible AI advisors. We evaluate concrete examples of arguments in each category and outline how arguments could be combined to justify that AI systems are safe to deploy.
翻译:随着人工智能系统日益先进,企业和监管机构将面临关于训练和部署这些系统是否安全的艰难决策。为应对这些决策,我们研究了开发者如何构建"安全案例"——即论证人工智能系统不太可能引发灾难的结构化推理。我们提出了一套组织安全案例的框架,并讨论了四类证明安全性的论证:完全无法引发灾难的能力、足够强大的控制措施、尽管具备危害能力但值得信赖,以及——若人工智能系统变得更加强大——对可信AI顾问的遵从。我们评估了每类论证的具体实例,并概述了如何整合不同论证来证明人工智能系统可以安全部署。