Artificial intelligence has shifted from a software-centric discipline to an infrastructure-driven system. Large-scale training and inference increasingly depend on tightly coupled data centers, high-capacity optical networks, and energy systems operating close to physical and environmental limits. As a result, control over data and algorithms alone is no longer sufficient to achieve meaningful AI sovereignty. Practical sovereignty now depends on who can deploy, operate, and adapt AI infrastructure under constraints imposed by energy availability, sustainability targets, and network reach. This tutorial-survey introduces the concept of AI infrastructure sovereignty, defined as the ability of a region, operator, or nation to exercise operational control over AI systems within physical and environmental limits. The paper argues that sovereignty emerges from the co-design of three layers: AI-oriented data centers, optical transport networks, and automation frameworks that provide real-time visibility and control. We analyze how AI workloads reshape data center design, driving extreme power densities, advanced cooling requirements, and tighter coupling to local energy systems, with sustainability metrics such as carbon intensity and water usage acting as hard deployment boundaries. We then examine optical networks as the backbone of distributed AI, showing how latency, capacity, failure domains, and jurisdictional control define practical sovereignty limits. Building on this foundation, the paper positions telemetry, agentic AI, and digital twins as enablers of operational sovereignty through validated, closed-loop control across compute, network, and energy domains. The tutorial concludes with a reference architecture for sovereign AI infrastructure that integrates telemetry pipelines, agent-based control, and digital twins, framing sustainability as a first-order design constraint.
翻译:人工智能已从以软件为核心的学科转变为以基础设施为驱动的系统。大规模训练和推理日益依赖于紧密耦合的数据中心、大容量光网络以及接近物理和环境极限运行的能源系统。因此,仅对数据和算法的控制已不足以实现有意义的AI主权。实际的主权如今取决于谁能在能源可用性、可持续性目标和网络覆盖范围的约束下部署、运行和调整AI基础设施。本教程-综述引入了AI基础设施主权的概念,定义为区域、运营商或国家在物理和环境限制内对AI系统实施运行控制的能力。文章论证主权源于三个层面的协同设计:面向AI的数据中心、光传输网络以及提供实时可见性和控制的自动化框架。我们分析了AI工作负载如何重塑数据中心设计,推动极高功率密度、先进冷却要求以及与本地能源系统的更紧密耦合,其中碳强度和水资源利用等可持续性指标成为硬性部署边界。随后,我们考察了光网络作为分布式AI的骨干,展示了延迟、容量、故障域和司法控制如何定义实际主权界限。在此基础上,本文将遥测、智能体AI和数字孪生定位为通过跨计算、网络和能源领域的验证闭环控制实现运行主权的使能因素。本教程以主权AI基础设施的参考架构作为结束,该架构集成了遥测管道、基于智能体的控制和数字孪生,并将可持续性作为首要设计约束。