To effectively address potential harms from AI systems, it is essential to identify and mitigate system-level hazards. Current analysis approaches focus on individual components of an AI system, like training data or models, in isolation, overlooking hazards from component interactions or how they are situated within a company's development process. To this end, we draw from the established field of system safety, which considers safety as an emergent property of the entire system, not just its components. In this work, we translate System Theoretic Process Analysis (STPA) - a recognized system safety framework - for analyzing AI operation and development processes. We focus on systems that rely on machine learning algorithms and conducted STPA on three case studies involving linear regression, reinforcement learning, and transformer-based generative models. Our analysis explored how STPA's control and system-theoretic perspectives apply to AI systems and whether unique AI traits - such as model opacity, capability uncertainty, and output complexity - necessitate significant modifications to the framework. We find that the key concepts and steps of conducting an STPA readily apply, albeit with a few adaptations tailored for AI systems. We present the Process-oriented Hazard Analysis for AI Systems (PHASE) as a guideline that adapts STPA concepts for AI, making STPA-based hazard analysis more accessible. PHASE enables four key affordances for analysts responsible for managing AI system harms: 1) detection of hazards at the systems level, including those from accumulation of disparate issues; 2) explicit acknowledgment of social factors contributing to experiences of algorithmic harms; 3) creation of traceable accountability chains between harms and those who can mitigate the harm; and 4) ongoing monitoring and mitigation of new hazards.
翻译:为有效应对AI系统可能带来的潜在危害,识别并缓解系统级危害至关重要。当前的分析方法孤立地关注AI系统的单个组件(如训练数据或模型),忽视了组件间交互或其在公司开发流程中所处情境引发的危害。为此,我们借鉴了系统安全这一成熟领域的思想,该领域将安全性视为整个系统(而非仅其组件)的涌现属性。本研究将公认的系统安全框架——系统理论过程分析(STPA)——转化应用于AI运行与开发过程的分析。我们聚焦于依赖机器学习算法的系统,并对涉及线性回归、强化学习和基于Transformer的生成模型的三个案例研究进行了STPA分析。我们探讨了STPA的控制与系统理论视角如何适用于AI系统,以及AI的独特特性(如模型不透明性、能力不确定性和输出复杂性)是否需要对框架进行重大修改。研究发现,尽管需要针对AI系统进行少量适应性调整,但STPA的核心概念与实施步骤基本适用。我们提出了面向AI系统的过程导向型危害分析(PHASE)作为指导框架,将STPA概念适配于AI领域,使基于STPA的危害分析更易于实施。PHASE为负责管理AI系统危害的分析人员提供了四项关键支持:1)在系统层面检测危害,包括由不同问题累积引发的危害;2)明确承认导致算法危害体验的社会因素;3)在危害与可缓解危害的责任方之间建立可追溯的问责链;4)持续监测并缓解新出现的危害。