Extreme weather and volatile wholesale electricity markets expose residential consumers to catastrophic financial risks, yet demand response at the distribution level remains an underutilized tool for grid flexibility and energy affordability. While a demand-response program can shield consumers by issuing financial credits during high-price periods, optimizing this sequential decision-making process presents a unique challenge for reinforcement learning despite the plentiful offline historical smart meter and wholesale pricing data available publicly. Offline historical data fails to capture the dynamic, interactive feedback loop between an electric utility's pricing signals and customer acceptance and adaptation to a demand-response program. To address this, we introduce DR-Gym, an open-source, online Gymnasium-compatible environment designed to train and evaluate demand-response from the electric utility's perspective. Unlike existing device-level energy simulators, our environment focuses on the market-level electric utility setting and provides a rich observational space relevant to the electric utility. The simulator additionally features a regime-switching wholesale price model calibrated to real-world extreme events, alongside physics-based building demand profiles. For our learning signal, we use a configurable, multi-objective reward function for specifying diverse learning objectives. We demonstrate through baseline strategies and data snapshots the capability of our simulator to create realistic and learnable environments.
翻译:极端天气与波动的批发电力市场使居民用户面临灾难性金融风险,然而配电网级需求响应作为提升电网灵活性与电力可负担性的工具仍未被充分利用。尽管需求响应项目可通过在高电价时段发放信用补贴为用户提供保护,但优化这一序贯决策过程对强化学习构成了独特挑战——即便存在大量公开的离线历史智能电表与批发电价数据。离线历史数据无法捕捉电力公司定价信号与用户对需求响应项目的接受及适应之间动态、交互的反馈回路。为此,我们提出DR-Gym——一个开源的、兼容Gymnasium的在线环境,用于从电力公司视角训练与评估需求响应策略。与现有设备级能源仿真器不同,本环境聚焦于市场级的电力公司场景,并提供与该场景相关的高维度观测空间。该仿真器还包含经真实极端事件校准的机制转换批发电价模型,以及基于物理的楼宇需求曲线。在学习信号方面,我们采用可配置的多目标奖励函数以定义多样化的学习目标。通过基线策略与数据快照,我们证明该仿真器能够创建真实且可学习的环境。