As artificial intelligence (AI) systems become more advanced, concerns about large-scale risks from misuse or accidents have grown. This report analyzes the technical research into safe AI development being conducted by three leading AI companies: Anthropic, Google DeepMind, and OpenAI. We define safe AI development as developing AI systems that are unlikely to pose large-scale misuse or accident risks. This encompasses a range of technical approaches aimed at ensuring AI systems behave as intended and do not cause unintended harm, even as they are made more capable and autonomous. We analyzed all papers published by the three companies from January 2022 to July 2024 that were relevant to safe AI development, and categorized the 61 included papers into eight safety approaches. Additionally, we noted three categories representing nascent approaches explored by academia and civil society, but not currently represented in any papers by the three companies. Our analysis reveals where corporate attention is concentrated and where potential gaps lie. Some AI research may stay unpublished for good reasons, such as to not inform adversaries about security techniques they would need to overcome to misuse AI systems. Therefore, we also considered the incentives that AI companies have to research each approach. In particular, we considered reputational effects, regulatory burdens, and whether the approaches could make AI systems more useful. We identified three categories where there are currently no or few papers and where we do not expect AI companies to become more incentivized to pursue this research in the future. These are multi-agent safety, model organisms of misalignment, and safety by design. Our findings provide an indication that these approaches may be slow to progress without funding or efforts from government, civil society, philanthropists, or academia.
翻译:随着人工智能系统日益先进,对其因误用或事故引发大规模风险的担忧也在增长。本报告分析了三家领先人工智能公司——Anthropic、Google DeepMind 和 OpenAI——正在开展的安全人工智能开发技术研究。我们将安全人工智能开发定义为开发不太可能构成大规模误用或事故风险的人工智能系统。这涵盖了一系列技术方法,旨在确保人工智能系统即使在其能力与自主性不断提升的过程中,仍能按预期运行且不会造成意外损害。我们分析了这三家公司于2022年1月至2024年7月期间发表的、与安全人工智能开发相关的所有论文,并将纳入的61篇论文归类为八种安全方法。此外,我们指出了三类代表学术界和民间社会正在探索但尚未在这三家公司任何论文中体现的新兴方法。我们的分析揭示了企业关注的焦点所在以及潜在的空白领域。某些人工智能研究可能出于正当理由不予发表,例如避免让对手了解其滥用人工智能系统所需克服的安全技术。因此,我们也考虑了人工智能公司研究每种方法的激励因素,特别是声誉效应、监管负担以及这些方法能否使人工智能系统更具实用性。我们识别出目前没有或仅有少量论文、且预计未来人工智能公司不会获得更多激励去开展研究的三个类别:多智能体安全、错位模型生物以及安全设计。我们的发现表明,若无政府、民间社会、慈善机构或学术界的资助或努力,这些方法的研究进展可能较为缓慢。