While power systems research relies on the availability of real-world network datasets, data owners (e.g., system operators) are hesitant to share data due to security and privacy risks. To control these risks, we develop privacy-preserving algorithms for the synthetic generation of optimization and machine learning datasets. Taking a real-world dataset as input, the algorithms output its noisy, synthetic version, which preserves the accuracy of the real data on a specific downstream model or even a large population of those. We control the privacy loss using Laplace and Exponential mechanisms of differential privacy and preserve data accuracy using a post-processing convex optimization. We apply the algorithms to generate synthetic network parameters and wind power data.
翻译:尽管电力系统研究依赖于真实世界网络数据集的可用性,但数据所有者(例如系统运营商)因安全和隐私风险而犹豫是否共享数据。为控制这些风险,我们开发了用于合成生成优化和机器学习数据集的隐私保护算法。该算法以真实数据集为输入,输出其带有噪声的合成版本,该版本在特定下游模型乃至大规模模型群体上都能保持真实数据的准确性。我们利用差分隐私的拉普拉斯机制和指数机制控制隐私损失,并通过后处理凸优化保持数据精度。我们将该算法应用于生成合成网络参数和风力发电数据。