Computational Safety for Generative AI: A Hypothesis Testing Perspective

AI safety is a rapidly growing area of research that seeks to prevent the harm and misuse of frontier AI technology, particularly with respect to generative AI (GenAI) tools that are capable of creating realistic and high-quality content through text prompts. Examples of such tools include large language models (LLMs) and text-to-image (T2I) diffusion models. As the performance of various leading GenAI models approaches saturation due to similar training data sources and neural network architecture designs, the development of reliable safety guardrails has become a key differentiator for responsibility and sustainability. This paper presents a formalization of the concept of computational safety, which is a mathematical framework that enables the quantitative assessment, formulation, and study of safety challenges in GenAI through the lens of signal processing theory and methods. In particular, we explore two exemplary categories of computational safety challenges in GenAI that can be formulated as hypothesis testing problems. For the safety of model input, we show how sensitivity analysis and loss landscape analysis can be used to detect malicious prompts with jailbreak attempts. For the safety of model output, we elucidate how statistical signal processing can be used to detect AI-generated content. Finally, we discuss key open research challenges, opportunities, and the essential role of signal processing in computational AI safety.

翻译：人工智能安全性是快速发展的研究方向，旨在防止前沿AI技术（尤其是能够通过文本提示生成逼真高质量内容的生成式AI工具）的滥用与危害。这类工具包括大语言模型和文生图扩散模型。由于各主流生成式AI模型在类似训练数据源和神经网络架构设计下性能趋于饱和，建立可靠的安全护栏已成为衡量责任担当与可持续性的关键差异化要素。本文提出计算安全性的形式化定义——这是一种数学框架，通过信号处理理论与方法的视角，实现对生成式AI安全挑战的定量评估、建模与研究。具体而言，我们探讨了可建模为假设检验问题的两类生成式AI计算安全典型挑战。针对模型输入安全，展示了如何利用敏感性分析和损失景观分析检测带越狱尝试的恶意提示；针对模型输出安全，阐明了如何运用统计信号处理检测AI生成内容。最后，我们讨论了关键开放性研究挑战、机遇以及信号处理在计算AI安全性中的核心作用。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

机密计算保障人工智能系统安全研究报告

专知会员服务

19+阅读 · 2025年1月20日

【新书】利用生成式人工智能进行网络防御策略

专知会员服务

31+阅读 · 2024年10月18日

AI在医疗中的安全挑战

专知会员服务

19+阅读 · 2024年10月5日

生成式人工智能大型语言模型的安全性：概述

专知会员服务

35+阅读 · 2024年7月30日