We tackle the complex challenge of scheduling the charging of electric vehicles (EVs) equipped with solar panels and batteries, particularly under out-of-distribution (OOD) conditions. Traditional scheduling approaches, such as reinforcement learning (RL) and model predictive control (MPC), often fail to provide satisfactory results when faced with OOD data, struggling to balance robustness (worst-case performance) and consistency (near-optimal average performance). To address this gap, we introduce a novel learning-augmented policy. This policy employs a dynamic robustness budget, which is adapted in real-time based on the reinforcement learning policy's performance. Specifically, it leverages the temporal difference (TD) error, a measure of the learning policy's prediction accuracy, to assess the trustworthiness of the machine-learned policy. This method allows for a more effective balance between consistency and robustness in EV charging schedules, significantly enhancing adaptability and efficiency in real-world, unpredictable environments. Our results demonstrate that this approach markedly improves scheduling effectiveness and reliability, particularly in OOD contexts, paving the way for more resilient and adaptive EV charging systems.
翻译:我们应对配备太阳能电池板和电池的电动汽车(EV)充电调度的复杂挑战,尤其是在分布外(OOD)条件下。传统的调度方法,如强化学习(RL)和模型预测控制(MPC),在面对OOD数据时往往无法提供令人满意的结果,难以在鲁棒性(最差情况性能)和一致性(接近最优的平均性能)之间取得平衡。为弥补这一不足,我们提出了一种新颖的学习增强策略。该策略采用动态鲁棒性预算,并基于强化学习策略的性能进行实时调整。具体而言,它利用时间差分(TD)误差(衡量学习策略预测准确性的指标)来评估机器学习策略的可信度。该方法能够在EV充电调度中更有效地平衡一致性与鲁棒性,显著增强在真实、不可预测环境中的适应性和效率。我们的结果表明,该方法显著提升了调度效果与可靠性,尤其在OOD背景下,为构建更具弹性和适应性的EV充电系统铺平了道路。