Artificial intelligence has shifted from a software-centric discipline to an infrastructure-driven system. Large-scale training and inference increasingly depend on tightly coupled data centers, high-capacity optical networks, and energy systems operating close to physical and environmental limits. As a result, control over data and algorithms alone is no longer sufficient to achieve meaningful AI sovereignty. Practical sovereignty now depends on who can deploy, operate, and adapt AI infrastructure under constraints imposed by energy availability, sustainability targets, and network reach. This tutorial-survey introduces the concept of AI infrastructure sovereignty, defined as the ability of a region, operator, or nation to exercise operational control over AI systems within physical and environmental limits. The paper argues that sovereignty emerges from the co-design of three layers: AI-oriented data centers, optical transport networks, and automation frameworks that provide real-time visibility and control. We analyze how AI workloads reshape data center design, driving extreme power densities, advanced cooling requirements, and tighter coupling to local energy systems, with sustainability metrics such as carbon intensity and water usage acting as hard deployment boundaries. We then examine optical networks as the backbone of distributed AI, showing how latency, capacity, failure domains, and jurisdictional control define practical sovereignty limits. Building on this foundation, the paper positions telemetry, agentic AI, and digital twins as enablers of operational sovereignty through validated, closed-loop control across compute, network, and energy domains. The tutorial concludes with a reference architecture for sovereign AI infrastructure that integrates telemetry pipelines, agent-based control, and digital twins, framing sustainability as a first-order design constraint.
翻译:人工智能已从以软件为中心的专业领域转变为以基础设施驱动的系统。大规模训练和推理日益依赖于紧密耦合的数据中心、大容量光网络以及运行在物理与环境极限附近能源系统。因此,仅对数据和算法的控制已不足以实现有意义的AI主权。实践中的主权现在取决于谁能够在能源可用性、可持续性目标和网络覆盖范围的约束下部署、运营和适配AI基础设施。本教程性综述引入"AI基础设施主权"这一概念,将其定义为区域、运营商或国家在物理和环境极限内对AI系统行使运营控制的能力。本文论证主权源于三个层面的协同设计:面向AI的数据中心、光传输网络,以及提供实时可见性和控制的自动化框架。我们分析了AI工作负载如何重塑数据中心设计,推动极高功耗密度、先进冷却需求以及与地方能源系统的更紧密耦合,同时碳强度和用水量等可持续性指标成为硬性部署边界。随后考察光网络作为分布式AI的骨干,展示延迟、容量、故障域和管辖权控制如何定义实际主权极限。在此基础之上,论文将遥测、智能体AI和数字孪生定位为通过计算、网络和能源领域的验证闭环控制实现运营主权的使能技术。本教程以主权AI基础设施的参考架构作为总结,该架构集成遥测管道、基于智能体的控制和数字孪生,将可持续性定位为一阶设计约束。