The AI Skills Shift: Mapping Skill Obsolescence, Emergence, and Transition Pathways in the LLM Era

As Large Language Models reshape the global labor market, policymakers and workers need empirical data on which occupational skills may be most susceptible to automation. We present the Skill Automation Feasibility Index (SAFI), benchmarking four frontier LLMs -- LLaMA 3.3 70B, Mistral Large, Qwen 2.5 72B, and Gemini 2.5 Flash -- across 263 text-based tasks spanning all 35 skills in the U.S. Department of Labor's O*NET taxonomy (1,052 total model calls, 0% failure rate). Cross-referencing with real-world AI adoption data from the Anthropic Economic Index (756 occupations, 17,998 tasks), we propose an AI Impact Matrix -- an interpretive framework that positions skills along four quadrants: High Displacement Risk, Upskilling Required, AI-Augmented, and Lower Displacement Risk. Key findings: (1) Mathematics (SAFI: 73.2) and Programming (71.8) receive the highest automation feasibility scores; Active Listening (42.2) and Reading Comprehension (45.5) receive the lowest; (2) a "capability-demand inversion" where skills most demanded in AI-exposed jobs are those LLMs perform least well at in our benchmark; (3) 78.7% of observed AI interactions are augmentation, not automation; (4) all four models converge to similar skill profiles (3.6-point spread), suggesting that text-based automation feasibility may be more skill-dependent than model-dependent. SAFI measures LLM performance on text-based representations of skills, not full occupational execution. All data, code, and model responses are open-sourced.

翻译：随着大语言模型重塑全球劳动力市场，政策制定者与从业者亟需关于哪些职业技能最易被自动化替代的实证数据。本研究提出技能自动化可行性指数（SAFI），对美国劳工部O*NET分类体系中全部35项技能的263项基于文本的任务，对四种前沿大语言模型（LLaMA 3.3 70B、Mistral Large、Qwen 2.5 72B、Gemini 2.5 Flash）进行了基准测试（合计1,052次模型调用，失败率为0%）。通过交叉比对Anthropic经济指数中涵盖756个职业、17,998项任务的真实世界AI应用数据，我们提出AI影响矩阵——一个将技能定位在四个象限（高替代风险、需技能提升、AI增强、低替代风险）中的解释框架。主要发现：（1）数学（SAFI：73.2分）和编程（71.8分）的自动化可行性评分最高，而主动倾听（42.2分）和阅读理解（45.5分）评分最低；（2）存在"能力-需求倒挂"现象，即AI渗透岗位中需求最迫切的技能，恰是LLMs在本基准测试中表现最差的技能；（3）78.7%的AI实际交互属于增强而非替代；（4）四个模型的技能评估曲线趋于一致（评分差异仅3.6个百分点），表明基于文本的自动化可行性可能更取决于技能本身而非模型差异。SAFI衡量的是LLMs在技能文本表征任务上的表现，而非完整的职业执行能力。所有数据、代码及模型响应均已开源。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

大语言模型智能体（LLM Agents）工具调用的演进：从单工具调用到多工具协同编排

专知会员服务

29+阅读 · 4月6日

AI 开发生命周期：大规模语言模型（LLMs）带来的变化学习

专知会员服务

34+阅读 · 2024年10月7日

【新书】掌握大语言模型：高级技术、应用、尖端方法和顶尖LLMs

专知会员服务

85+阅读 · 2024年4月24日