As AI systems become more advanced, companies and regulators will make difficult decisions about whether it is safe to train and deploy them. To prepare for these decisions, we investigate how developers could make a 'safety case,' which is a structured rationale that AI systems are unlikely to cause a catastrophe. We propose a framework for organizing a safety case and discuss four categories of arguments to justify safety: total inability to cause a catastrophe, sufficiently strong control measures, trustworthiness despite capability to cause harm, and deference to credible AI advisors. We evaluate concrete examples of arguments in each category and outline how arguments could be combined to justify that AI systems are safe to deploy.
翻译:随着AI系统日益先进,企业和监管机构将面临关于训练和部署这些系统是否安全的艰难决策。为应对这些决策,我们研究了开发者如何构建"安全案例"——这是一种论证AI系统不太可能造成灾难的结构化推理过程。我们提出了一套组织安全案例的框架,并探讨了四大类论证安全性的论点:完全不具备引发灾难的能力、足够强大的控制措施、在具备危害能力的情况下仍值得信赖、以及服从可靠AI顾问的指导。我们评估了每类论点的具体实例,并概述了如何组合各类论点以论证AI系统的部署安全性。