Frontier artificial intelligence (AI) systems could pose increasing risks to public safety and security. But what level of risk is acceptable? One increasingly popular approach is to define capability thresholds, which describe AI capabilities beyond which an AI system is deemed to pose too much risk. A more direct approach is to define risk thresholds that simply state how much risk would be too much. For instance, they might state that the likelihood of cybercriminals using an AI system to cause X amount of economic damage must not increase by more than Y percentage points. The main upside of risk thresholds is that they are more principled than capability thresholds, but the main downside is that they are more difficult to evaluate reliably. For this reason, we currently recommend that companies (1) define risk thresholds to provide a principled foundation for their decision-making, (2) use these risk thresholds to help set capability thresholds, and then (3) primarily rely on capability thresholds to make their decisions. Regulators should also explore the area because, ultimately, they are the most legitimate actors to define risk thresholds. If AI risk estimates become more reliable, risk thresholds should arguably play an increasingly direct role in decision-making.
翻译:前沿人工智能系统可能对公共安全与安保构成日益增长的风险。但何种风险水平是可接受的?一种日益流行的方法是定义能力阈值,即描述超出该阈值的人工智能系统被认为风险过高。更直接的方法是定义风险阈值,直接说明何种风险水平属于过高。例如,风险阈值可规定:网络犯罪分子利用人工智能系统造成X规模经济损失的可能性增幅不得超过Y个百分点。风险阈值的主要优势在于其比能力阈值更具原则性,但主要缺点在于难以可靠评估。因此,我们当前建议企业:(1)定义风险阈值,为其决策提供原则性基础;(2)利用这些风险阈值辅助设定能力阈值;(3)主要依据能力阈值进行决策。监管机构也应探索这一领域,因为最终他们才是定义风险阈值最具合法性的主体。若人工智能风险评估的可靠性提升,风险阈值理应在决策中发挥日益直接的作用。