LLMOrbit: A Circular Taxonomy of Large Language Models -From Scaling Walls to Agentic AI Systems

The field of artificial intelligence has undergone a revolution from foundational Transformer architectures to reasoning-capable systems approaching human-level performance. We present LLMOrbit, a comprehensive circular taxonomy navigating the landscape of large language models spanning 2019-2025. This survey examines over 50 models across 15 organizations through eight interconnected orbital dimensions, documenting architectural innovations, training methodologies, and efficiency patterns defining modern LLMs, generative AI, and agentic systems. We identify three critical crises: (1) data scarcity (9-27T tokens depleted by 2026-2028), (2) exponential cost growth ($3M to $300M+ in 5 years), and (3) unsustainable energy consumption (22x increase), establishing the scaling wall limiting brute-force approaches. Our analysis reveals six paradigms breaking this wall: (1) test-time compute (o1, DeepSeek-R1 achieve GPT-4 performance with 10x inference compute), (2) quantization (4-8x compression), (3) distributed edge computing (10x cost reduction), (4) model merging, (5) efficient training (ORPO reduces memory 50%), and (6) small specialized models (Phi-4 14B matches larger models). Three paradigm shifts emerge: (1) post-training gains (RLHF, GRPO, pure RL contribute substantially, DeepSeek-R1 achieving 79.8% MATH), (2) efficiency revolution (MoE routing 18x efficiency, Multi-head Latent Attention 8x KV cache compression enables GPT-4-level performance at <$0.30/M tokens), and (3) democratization (open-source Llama 3 88.6% MMLU surpasses GPT-4 86.4%). We provide insights into techniques (RLHF, PPO, DPO, GRPO, ORPO), trace evolution from passive generation to tool-using agents (ReAct, RAG, multi-agent systems), and analyze post-training innovations.

翻译：人工智能领域经历了一场从基础Transformer架构到接近人类水平推理能力系统的革命。我们提出LLMOrbit，这是一个全面的环形分类法，用于梳理2019年至2025年间大语言模型的发展格局。本综述通过八个相互关联的轨道维度，审视了来自15个组织的超过50个模型，记录了定义现代LLM、生成式AI和智能体系统的架构创新、训练方法和效率模式。我们识别出三个关键危机：(1) 数据稀缺（预计2026-2028年耗尽9-27万亿token），(2) 指数级成本增长（5年内从300万美元增至3亿美元以上），(3) 不可持续的能源消耗（增长22倍），这些共同构成了限制暴力扩展方法的规模壁垒。我们的分析揭示了突破此壁垒的六种范式：(1) 推理时计算（o1、DeepSeek-R1以10倍推理计算量达到GPT-4性能），(2) 量化（4-8倍压缩），(3) 分布式边缘计算（成本降低10倍），(4) 模型融合，(5) 高效训练（ORPO减少50%内存），以及(6) 小型专用模型（Phi-4 14B匹敌更大模型）。三大范式转变随之浮现：(1) 训练后增益（RLHF、GRPO、纯强化学习贡献显著，DeepSeek-R1在MATH基准上达到79.8%），(2) 效率革命（MoE路由提升18倍效率，多头潜在注意力实现8倍KV缓存压缩，使得GPT-4级别性能的成本低于每百万token 0.30美元），以及(3) 民主化（开源Llama 3在MMLU上以88.6%超越GPT-4的86.4%）。我们深入剖析了相关技术（RLHF、PPO、DPO、GRPO、ORPO），追溯了从被动生成到工具使用智能体（ReAct、RAG、多智能体系统）的演进，并分析了训练后的创新。