Human-AI Safety: A Descendant of Generative AI and Control Systems Safety

Generative artificial intelligence (AI) is interacting with people at an unprecedented scale, offering new avenues for immense positive impact, but also raising widespread concerns around the potential for individual and societal harm. Today, the predominant paradigm for human-AI safety focuses on fine-tuning the generative model's outputs to better agree with human-provided examples or feedback. In reality, however, the consequences of an AI model's outputs cannot be determined in an isolated context: they are tightly entangled with the responses and behavior of human users over time. In this position paper, we argue that meaningful safety assurances for these AI technologies can only be achieved by reasoning about how the feedback loop formed by the AI's outputs and human behavior may drive the interaction towards different outcomes. To this end, we envision a high-value window of opportunity to bridge the rapidly growing capabilities of generative AI and the dynamical safety frameworks from control theory, laying a new foundation for human-centered AI safety in the coming decades.

翻译：生成式人工智能（AI）正以前所未有的规模与人类互动，为带来巨大积极影响提供了新途径，但也引发了关于个人和社会潜在危害的广泛担忧。当前，人机安全的主流范式集中于通过微调生成模型的输出，使其更好地匹配人类提供的示例或反馈。然而，在现实中，AI模型输出的后果无法在孤立情境中确定：它们与人类用户随时间变化的反应和行为紧密交织。在本立场论文中，我们认为，要为这些AI技术提供有意义的安全保障，必须推断由AI输出和人类行为形成的反馈回路如何将互动导向不同结果。为此，我们展望了一个高价值的机遇窗口：将生成式AI快速发展的能力与控制理论的动态安全框架相连接，为未来几十年以人为中心的AI安全奠定新基础。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日