Safety cases, structured arguments that a system is acceptably safe, are becoming central to the governance of AI systems. Yet, traditional safety-case practices from aviation or nuclear engineering rely on well-specified system boundaries, stable architectures, and known failure modes. Modern AI systems such as generative and agentic AI are the opposite. Their capabilities emerge unpredictably from low-level training objectives, their behaviour varies with prompts, and their risk profiles shift through fine-tuning, scaffolding, or deployment context. This study examines how safety cases are currently constructed for AI systems and why classical approaches fail to capture these dynamics. It then proposes a framework of reusable safety-case templates, each following a predefined structure of claims, arguments, and evidence tailored for AI systems. The framework introduces comprehensive taxonomies for AI-specific claim types (assertion-based, constrained-based, capability-based), argument types (demonstrative, comparative, causal/explanatory, risk-based, and normative), and evidence families (empirical, mechanistic, comparative, expert-driven, formal methods, operational/field data, and model-based). Each template is illustrated through end-to-end patterns addressing distinctive challenges such as evaluation without ground truth, dynamic model updates, and threshold-based risk decisions. The result is a systematic, composable, and reusable approach to constructing and maintaining safety cases that are credible, auditable, and adaptive to the evolving behaviour of generative and frontier AI systems.
翻译:安全案例作为论证系统可接受安全性的结构化论据,正逐渐成为AI系统治理的核心。然而,源自航空或核工程领域的传统安全案例实践依赖于明确界定的系统边界、稳定的架构以及已知的失效模式。现代AI系统(如生成式AI与智能体AI)则恰恰相反:其能力从底层训练目标中不可预测地涌现,其行为随提示词而变化,其风险特征会通过微调、框架构建或部署环境发生改变。本研究探讨了当前AI系统安全案例的构建方式,并分析了经典方法为何无法捕捉这些动态特性。随后,我们提出了一套可复用安全案例模板框架,每个模板遵循为AI系统定制的声明、论证与证据的预定义结构。该框架引入了针对AI特性的综合分类体系,涵盖声明类型(基于断言、基于约束、基于能力)、论证类型(演示性、比较性、因果/解释性、基于风险、规范性)以及证据族(实证性、机制性、比较性、专家驱动、形式化方法、运行/现场数据、基于模型)。每个模板通过端到端的模式进行阐释,以应对无真实基准下的评估、动态模型更新、基于阈值的风险决策等独特挑战。最终形成了一种系统化、可组合且可复用的方法,用于构建和维护可信、可审计并能适应生成式与前沿AI系统行为演化的安全案例。