Surpassing legacy approaches and human intelligence with hybrid single- and multi-objective Reinforcement Learning-based optimization and interpretable AI to enable the economic operation of the US nuclear fleet

2024 年 2 月 16 日

翻译：超越传统方法与人类智能：基于混合单/多目标强化学习优化与可解释人工智能实现美国核电站经济运营

Paul Seurin,Koroush Shirvan

The nuclear sector represents the primary source of carbon-free energy in the United States. Nevertheless, existing nuclear power plants face the threat of early shutdowns due to their inability to compete economically against alternatives such as gas power plants. Optimizing the fuel cycle cost through the optimization of core loading patterns is one approach to addressing this lack of competitiveness. However, this optimization task involves multiple objectives and constraints, resulting in a vast number of candidate solutions that cannot be explicitly solved. While stochastic optimization (SO) methodologies are utilized by various nuclear utilities and vendors for fuel cycle reload design, manual design remains the preferred approach. To advance the state-of-the-art in core reload patterns, we have developed methods based on Deep Reinforcement Learning. Previous research has laid the groundwork for this approach and demonstrated its ability to discover high-quality patterns within a reasonable timeframe. However, there is a need for comparison against legacy methods to demonstrate its utility in a single-objective setting. While RL methods have shown superiority in multi-objective settings, they have not yet been applied to address the competitiveness issue effectively. In this paper, we rigorously compare our RL-based approach against the most commonly used SO-based methods, namely Genetic Algorithm (GA), Simulated Annealing (SA), and Tabu Search (TS). Subsequently, we introduce a new hybrid paradigm to devise innovative designs, resulting in economic gains ranging from 2.8 to 3.3 million dollars per year per plant. This development leverages interpretable AI, enabling improved algorithmic efficiency by making black-box optimizations interpretable. Future work will focus on scaling this method to address a broader range of core designs.

翻译：核能领域是美国无碳能源的主要来源。然而，现有核电站因无法在经济上与天然气发电厂等替代能源竞争，面临提前关停的威胁。通过优化堆芯装载模式来降低燃料循环成本，是解决这一竞争力不足问题的途径之一。但该优化任务涉及多目标与多约束条件，导致候选方案数量庞大而无法显式求解。尽管核电行业的多家公用事业公司和供应商在燃料循环换料设计中采用了随机优化方法，但人工设计仍是首选方案。为推进堆芯换料模式的最新技术，我们开发了基于深度强化学习的方法。此前的研究已为此方法奠定基础，并证明其能够在合理时间内发现高质量换料模式。然而，为验证其在单目标场景中的实用性，仍需与传统方法进行对比。尽管强化学习方法在多目标场景中已展现优越性，但其尚未被有效应用于解决竞争力问题。本文中，我们将基于强化学习的方法与最常用的随机优化方法——遗传算法、模拟退火算法和禁忌搜索算法——进行了严格对比。随后，我们引入一种新的混合范式来设计创新方案，每座电站每年可实现280万至330万美元的经济收益。该开发利用了可解释人工智能，通过使黑箱优化过程可解释，提升了算法效率。未来工作将聚焦于将该方法扩展至更广泛的堆芯设计方案。