Mapping Technical Safety Research at AI Companies: A literature review and incentives analysis

As artificial intelligence (AI) systems become more advanced, concerns about large-scale risks from misuse or accidents have grown. This report analyzes the technical research into safe AI development being conducted by three leading AI companies: Anthropic, Google DeepMind, and OpenAI. We define safe AI development as developing AI systems that are unlikely to pose large-scale misuse or accident risks. This encompasses a range of technical approaches aimed at ensuring AI systems behave as intended and do not cause unintended harm, even as they are made more capable and autonomous. We analyzed all papers published by the three companies from January 2022 to July 2024 that were relevant to safe AI development, and categorized the 61 included papers into eight safety approaches. Additionally, we noted three categories representing nascent approaches explored by academia and civil society, but not currently represented in any papers by the three companies. Our analysis reveals where corporate attention is concentrated and where potential gaps lie. Some AI research may stay unpublished for good reasons, such as to not inform adversaries about security techniques they would need to overcome to misuse AI systems. Therefore, we also considered the incentives that AI companies have to research each approach. In particular, we considered reputational effects, regulatory burdens, and whether the approaches could make AI systems more useful. We identified three categories where there are currently no or few papers and where we do not expect AI companies to become more incentivized to pursue this research in the future. These are multi-agent safety, model organisms of misalignment, and safety by design. Our findings provide an indication that these approaches may be slow to progress without funding or efforts from government, civil society, philanthropists, or academia.

翻译：随着人工智能系统日益先进，对其因误用或事故引发大规模风险的担忧也在增长。本报告分析了三家领先人工智能公司——Anthropic、Google DeepMind 和 OpenAI——正在开展的安全人工智能开发技术研究。我们将安全人工智能开发定义为开发不太可能构成大规模误用或事故风险的人工智能系统。这涵盖了一系列技术方法，旨在确保人工智能系统即使在其能力与自主性不断提升的过程中，仍能按预期运行且不会造成意外损害。我们分析了这三家公司于2022年1月至2024年7月期间发表的、与安全人工智能开发相关的所有论文，并将纳入的61篇论文归类为八种安全方法。此外，我们指出了三类代表学术界和民间社会正在探索但尚未在这三家公司任何论文中体现的新兴方法。我们的分析揭示了企业关注的焦点所在以及潜在的空白领域。某些人工智能研究可能出于正当理由不予发表，例如避免让对手了解其滥用人工智能系统所需克服的安全技术。因此，我们也考虑了人工智能公司研究每种方法的激励因素，特别是声誉效应、监管负担以及这些方法能否使人工智能系统更具实用性。我们识别出目前没有或仅有少量论文、且预计未来人工智能公司不会获得更多激励去开展研究的三个类别：多智能体安全、错位模型生物以及安全设计。我们的发现表明，若无政府、民间社会、慈善机构或学术界的资助或努力，这些方法的研究进展可能较为缓慢。

相关内容

关注 7107

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

斯坦福李飞飞高徒Johnson博士论文: 组成式计算机视觉智能,195页PDF

专知会员服务

71+阅读 · 2019年10月27日