Operational constraint violations may occur when deep reinforcement learning (DRL) agents interact with real-world active distribution systems to learn their optimal policies during training. This letter presents a universal distributionally robust safety filter (DRSF) using which any DRL agent can reduce the constraint violations of distribution systems significantly during training while maintaining near-optimal solutions. The DRSF is formulated as a distributionally robust optimization problem with chance constraints of operational limits. This problem aims to compute near-optimal actions that are minimally modified from the optimal actions of DRL-based Volt/VAr control by leveraging the distribution system model, thereby providing constraint satisfaction guarantee with a probability level under the model uncertainty. The performance of the proposed DRSF is verified using the IEEE 33-bus and 123-bus systems.
翻译:深度强化学习代理在与真实有源配电系统交互以学习最优策略的训练过程中,可能引发运行约束违规。本文提出一种通用分布鲁棒安全过滤器(DRSF),使任意深度强化学习代理能在保持近优解的同时,显著降低训练过程中的配电系统约束违规。该DRSF被建模为含运行极限机会约束的分布鲁棒优化问题,通过利用配电系统模型,对基于深度强化学习的电压/无功控制最优动作进行最小化修正以计算近优动作,从而在模型不确定性条件下以概率水平保证约束满足性。采用IEEE 33节点和123节点系统验证了所提DRSF的性能。