Data privacy and security have become a non-negligible factor in load forecasting. Previous researches mainly focus on training stage enhancement. However, once the model is trained and deployed, it may need to `forget' (i.e., remove the impact of) part of training data if the data is found to be malicious or as requested by the data owner. This paper introduces machine unlearning algorithm which is specifically designed to remove the influence of part of the original dataset on an already trained forecaster. However, direct unlearning inevitably degrades the model generalization ability. To balance between unlearning completeness and performance degradation, a performance-aware algorithm is proposed by evaluating the sensitivity of local model parameter change using influence function and sample re-weighting. Moreover, we observe that the statistic criterion cannot fully reflect the operation cost of down-stream tasks. Therefore, a task-aware machine unlearning is proposed whose objective is a tri-level optimization with dispatch and redispatch problems considered. We theoretically prove the existence of the gradient of such objective, which is key to re-weighting the remaining samples. We test the unlearning algorithms on linear and neural network load forecasters with realistic load dataset. The simulation demonstrates the balance on unlearning completeness and operational cost. All codes can be found at https://github.com/xuwkk/task_aware_machine_unlearning.
翻译:数据隐私与安全已成为负荷预测中不可忽视的因素。以往研究主要关注训练阶段的增强。然而,一旦模型训练并部署后,若发现部分训练数据存在恶意或应数据所有者要求,模型可能需要“遗忘”(即移除其影响)。本文引入了一种专门设计的机器遗忘算法,旨在消除原始数据集部分内容对已训练预测模型的影响。然而,直接遗忘不可避免地会降低模型的泛化能力。为平衡遗忘完整性与性能衰减,提出了一种性能感知算法,通过影响函数和样本重加权评估局部模型参数变化的敏感性。此外,我们观察到统计准则无法充分反映下游任务的运行成本。因此,提出了一种任务感知型机器遗忘,其目标为考虑调度和再调度问题的三层次优化。我们从理论上证明了该目标梯度的存在性,这是对剩余样本进行重加权的关键。我们在线性与神经网络负荷预测器上,使用真实负荷数据集测试了遗忘算法。仿真结果展示了遗忘完整性与运行成本之间的平衡。所有代码见https://github.com/xuwkk/task_aware_machine_unlearning。