When AI Fails, What Works? A Data-Driven Taxonomy of Real-World AI Risk Mitigation Strategies

Large language models (LLMs) are increasingly embedded in high-stakes workflows, where failures propagate beyond isolated model errors into systemic breakdowns that can lead to legal exposure, reputational damage, and material financial losses. Building on this shift from model-centric risks to end-to-end system vulnerabilities, we analyze real-world AI incident reporting and mitigation actions to derive an empirically grounded taxonomy that links failure dynamics to actionable interventions. Using a unified corpus of 9,705 media-reported AI incident articles, we extract explicit mitigation actions from 6,893 texts via structured prompting and then systematically classify responses to extend MIT's AI Risk Mitigation Taxonomy. Our taxonomy introduces four new mitigation categories, including 1) Corrective and Restrictive Actions, 2) Legal/Regulatory and Enforcement Actions, 3) Financial, Economic, and Market Controls, and 4) Avoidance and Denial, capturing response patterns that are becoming increasingly prevalent as AI deployment and regulation evolve. Quantitatively, we label the mitigation dataset with 32 distinct labels, producing 23,994 label assignments; 9,629 of these reflect previously unseen mitigation patterns, yielding a 67% increase of the original subcategory coverage and substantially enhancing the taxonomy's applicability to emerging systemic failure modes. By structuring incident responses, the paper strengthens "diagnosis-to-prescription" guidance and advances continuous, taxonomy-aligned post-deployment monitoring to prevent cascading incidents and downstream impact.

翻译：大型语言模型（LLM）正日益嵌入高风险工作流程中，其失败不再局限于孤立的模型错误，而是会演变为系统性故障，可能导致法律风险、声誉损害和实质性财务损失。基于这种从以模型为中心的风险向端到端系统漏洞的转变，我们通过分析现实世界的人工智能事件报告与缓解措施，推导出一个基于实证的分类法，将故障动态与可操作的干预措施联系起来。利用一个包含9,705篇媒体报道的人工智能事件文章的统一语料库，我们通过结构化提示从6,893篇文本中提取出明确的缓解措施，并系统地对应对措施进行分类，以扩展麻省理工学院（MIT）的人工智能风险缓解分类法。我们的分类法引入了四个新的缓解类别，包括：1）纠正性与限制性措施；2）法律/监管与执法措施；3）财务、经济与市场控制；4）规避与否认，这些类别捕捉了随着人工智能部署与监管发展而日益普遍的应对模式。在量化方面，我们使用32个不同的标签对缓解数据集进行标注，生成了23,994个标签分配；其中9,629个反映了先前未见的缓解模式，使原子类别覆盖率提升了67%，显著增强了该分类法对新兴系统性故障模式的适用性。通过对事件应对措施进行结构化梳理，本文强化了“从诊断到处置”的指导框架，并推进了持续、与分类法一致的后部署监测，以预防连锁事件及下游影响。