Artificial intelligence has shifted from a software-centric discipline to an infrastructure-driven system. Large-scale training and inference increasingly depend on tightly coupled data centers, high-capacity optical networks, and energy systems operating close to physical and environmental limits. As a result, control over data and algorithms alone is no longer sufficient to achieve meaningful AI sovereignty. Practical sovereignty now depends on who can deploy, operate, and adapt AI infrastructure under constraints imposed by energy availability, sustainability targets, and network reach. This tutorial-survey introduces the concept of AI infrastructure sovereignty, defined as the ability of a region, operator, or nation to exercise operational control over AI systems within physical and environmental limits. The paper argues that sovereignty emerges from the co-design of three layers: AI-oriented data centers, optical transport networks, and automation frameworks that provide real-time visibility and control. We analyze how AI workloads reshape data center design, driving extreme power densities, advanced cooling requirements, and tighter coupling to local energy systems, with sustainability metrics such as carbon intensity and water usage acting as hard deployment boundaries. We then examine optical networks as the backbone of distributed AI, showing how latency, capacity, failure domains, and jurisdictional control define practical sovereignty limits. Building on this foundation, the paper positions telemetry, agentic AI, and digital twins as enablers of operational sovereignty through validated, closed-loop control across compute, network, and energy domains. The tutorial concludes with a reference architecture for sovereign AI infrastructure that integrates telemetry pipelines, agent-based control, and digital twins, framing sustainability as a first-order design constraint.
翻译:人工智能已从以软件为中心的学科转变为由基础设施驱动的系统。大规模训练和推理日益依赖于紧密耦合的数据中心、高容量光网络以及在物理与环境极限附近运行的能源系统。因此,仅控制数据和算法已不足以实现有意义的人工智能主权。实际的主权现在取决于谁能在能源可用性、可持续性目标和网络覆盖范围等约束条件下部署、运营和适配人工智能基础设施。本教程综述引入了人工智能基础设施主权的概念,其定义为一个地区、运营商或国家在物理与环境极限内对人工智能系统行使运营控制的能力。本文认为,主权源于三个层次的协同设计:面向人工智能的数据中心、光传输网络以及提供实时可见性与控制的自动化框架。我们分析了人工智能工作负载如何重塑数据中心设计,推动极高的功率密度、先进的冷却要求以及与本地能源系统更紧密的耦合,其中碳强度和水资源使用等可持续性指标构成了硬性的部署边界。随后,我们审视了作为分布式人工智能骨干的光网络,展示了延迟、容量、故障域和管辖控制如何界定实际的主权极限。在此基础上,本文将遥测、智能体人工智能和数字孪生定位为通过跨计算、网络和能源领域的已验证闭环控制来实现运营主权的使能技术。本教程最后提出了一个主权人工智能基础设施的参考架构,该架构集成了遥测管道、基于智能体的控制和数字孪生,并将可持续性作为一阶设计约束。