Artificial intelligence has shifted from a software-centric discipline to an infrastructure-driven system. Training and inference at scale now depend on tightly connected data centers, high-capacity optical networks, and energy systems operating close to their physical and environmental limits. In this context, control over data and algorithms is not enough. Real AI sovereignty depends on the ability to deploy, operate, and adapt infrastructure under constraints such as energy availability, sustainability requirements, and network reach. This tutorial-survey introduces the concept of AI infrastructure sovereignty, defined as the ability of a region, operator, or nation to maintain operational control over AI systems within these constraints. The central idea is that sovereignty emerges from the joint design of three layers: AI-oriented data centers, optical transport networks, and control frameworks that provide real-time visibility and coordination across them. We first examine how AI workloads are reshaping data center design, pushing power densities higher, increasing cooling demands, and tightening the relationship with local energy systems. In this setting, factors such as carbon intensity and water usage become hard limits on where and how AI can be deployed. We then look at optical networks as the backbone of distributed AI, showing how latency, capacity, failure domains, and jurisdictional boundaries directly influence what can be achieved in practice. Building on this foundation, the paper highlights the role of telemetry, agentic AI, and digital twins as key enablers of operational sovereignty. Together, they make it possible to monitor, coordinate, and validate system behavior across compute, network, and energy domains in a closed loop.
翻译:人工智能已从以软件为中心的学科转变为以基础设施驱动的系统。如今,大规模训练和推理依赖于紧密连接的数据中心、高容量光网络以及在其物理和环境极限附近运行的能源系统。在此背景下,对数据和算法的控制已不足以满足需求。真正的人工智能主权取决于在能源可用性、可持续性要求和网络覆盖范围等约束下部署、运行和调整基础设施的能力。本教程式综述提出了人工智能基础设施主权的概念,将其定义为一个地区、运营商或国家在这些约束下维持对人工智能系统运营控制的能力。核心思想在于,主权源于三个层次的联合设计:面向人工智能的数据中心、光传输网络以及提供跨层实时可见性与协调的控制框架。我们首先考察人工智能工作负载如何重塑数据中心设计——推动功率密度攀升、冷却需求增加,并收紧与本地能源系统的关系。在此情境下,碳强度和水资源使用等要素成为人工智能部署地点与方式的硬性限制。随后,我们将光网络视为分布式人工智能的骨干,展示延迟、容量、故障域和管辖边界如何直接影响实际可行方案的边界。以此为基础,本文强调遥测、自主式人工智能和数字孪生作为运营主权关键赋能因素的作用。三者协同,使得在计算、网络和能源领域之间以闭环方式监控、协调和验证系统行为成为可能。