Access to granular demand data is essential for the net zero transition; it allows for accurate profiling and active demand management as our reliance on variable renewable generation increases. However, public release of this data is often impossible due to privacy concerns. Good quality synthetic data can circumnavigate this issue. Despite significant research on generating synthetic smart meter data, there is still insufficient work on creating a consistent evaluation framework. In this paper, we investigate how common frameworks used by other industries leveraging synthetic data, can be applied to synthetic smart meter data, such as fidelity, utility and privacy. We also recommend specific metrics to ensure that defining aspects of smart meter data are preserved and test the extent to which privacy can be protected using differential privacy. We show that standard privacy attack methods like reconstruction or membership inference attacks are inadequate for assessing privacy risks of smart meter datasets. We propose an improved method by injecting training data with implausible outliers, then launching privacy attacks directly on these outliers. The choice of $\epsilon$ (a metric of privacy loss) significantly impacts privacy risk, highlighting the necessity of performing these explicit privacy tests when making trade-offs between fidelity and privacy.
翻译:获取细粒度需求数据对于实现净零转型至关重要;随着我们对波动性可再生能源发电依赖度的增加,这类数据能够支持精准的用户画像分析和主动需求侧管理。然而,由于隐私顾虑,此类数据的公开发布通常无法实现。高质量的合成数据可以规避这一问题。尽管在生成智能电表合成数据方面已有大量研究,但建立一致的评估框架方面的工作仍显不足。本文探讨了其他利用合成数据的行业所采用的通用框架(如保真度、效用性和隐私性)如何应用于智能电表合成数据。我们还推荐了具体评估指标,以确保智能电表数据的关键特征得以保留,并测试了通过差分隐私技术保护隐私的程度。研究表明,标准的隐私攻击方法(如重构攻击或成员推理攻击)不足以评估智能电表数据集的隐私风险。我们提出一种改进方法:在训练数据中注入不合理的异常值,然后直接对这些异常值发起隐私攻击。参数$\epsilon$(隐私损失度量指标)的选择对隐私风险具有显著影响,这凸显了在权衡保真度与隐私性时,执行此类显式隐私测试的必要性。