Data privacy and security have become a non-negligible factor in load forecasting. Previous researches mainly focus on training stage enhancement. However, once the model is trained and deployed, it may need to `forget' (i.e., remove the impact of) part of training data if the these data are found to be malicious or as requested by the data owner. This paper introduces the concept of machine unlearning which is specifically designed to remove the influence of part of the dataset on an already trained forecaster. However, direct unlearning inevitably degrades the model generalization ability. To balance between unlearning completeness and model performance, a performance-aware algorithm is proposed by evaluating the sensitivity of local model parameter change using influence function and sample re-weighting. Furthermore, we observe that the statistical criterion such as mean squared error, cannot fully reflect the operation cost of the downstream tasks in power system. Therefore, a task-aware machine unlearning is proposed whose objective is a trilevel optimization with dispatch and redispatch problems considered. We theoretically prove the existence of the gradient of such an objective, which is key to re-weighting the remaining samples. We tested the unlearning algorithms on linear, CNN, and MLP-Mixer based load forecasters with a realistic load dataset. The simulation demonstrates the balance between unlearning completeness and operational cost. All codes can be found at https://github.com/xuwkk/task_aware_machine_unlearning.
翻译:数据隐私和安全已成为负荷预测中不可忽视的因素。以往研究主要关注训练阶段的增强。然而,一旦模型训练并部署后,若发现部分训练数据存在恶意性或应数据所有者要求,模型可能需要“遗忘”(即移除这些数据的影响)。本文引入专为移除已训练预测器中部分数据集影响而设计的机器遗忘概念。但直接遗忘会不可避免地降低模型的泛化能力。为平衡遗忘完备性与模型性能,本文通过使用影响函数和样本重加权评估局部模型参数变化的敏感性,提出了一种性能感知算法。此外,我们观察到均方误差等统计指标无法完全反映电力系统下游任务的运行成本。因此,提出了一种任务感知的机器遗忘方法,其目标函数为考虑调度和再调度问题的三层优化模型。我们从理论上证明了该目标函数梯度的存在性,这是对剩余样本进行重加权处理的关键。我们在基于线性、CNN和MLP-Mixer的负荷预测器上,使用真实负荷数据集测试了遗忘算法。仿真结果展示了遗忘完备性与运行成本之间的平衡。所有代码可在 https://github.com/xuwkk/task_aware_machine_unlearning 获取。