Smart meter data is the foundation for planning and operating the distribution network. Unfortunately, such data are not always available due to privacy regulations. Meanwhile, the collected data may be corrupted due to sensor or transmission failure, or it may not have sufficient resolution for downstream tasks. A wide range of generative tasks is formulated to address these issues, including synthetic data generation, missing data imputation, and super-resolution. Despite the success of machine learning models on these tasks, dedicated models need to be designed and trained for each task, leading to redundancy and inefficiency. In this paper, by recognizing the powerful modeling capability of flow matching models, we propose a new approach to unify diverse smart meter data generative tasks with a single model trained for conditional generation. The proposed flow matching models are trained to generate challenging, high-dimensional time series data, specifically monthly smart meter data at a 15 min resolution. By viewing different generative tasks as distinct forms of partial data observations and injecting them into the generation process, we unify tasks such as imputation and super-resolution with a single model, eliminating the need for re-training. The data generated by our model not only are consistent with the given observations but also remain realistic, showing better performance against interpolation and other machine learning based baselines dedicated to the tasks.
翻译:智能电表数据是配电网规划与运行的基础。然而,由于隐私法规限制,此类数据往往难以获取。同时,采集的数据可能因传感器或传输故障而损坏,或分辨率不足以支撑下游任务。为应对这些问题,学界已提出多种生成任务,包括合成数据生成、缺失数据插补和超分辨率重建。尽管机器学习模型在这些任务上取得了成功,但每个任务仍需专门设计和训练独立模型,导致冗余与效率低下。本文基于流匹配模型强大的建模能力,提出一种新方法,通过训练单一条件生成模型来统一多样的智能电表数据生成任务。所提出的流匹配模型经过训练,能够生成具有挑战性的高维时间序列数据,特别是15分钟分辨率的月度智能电表数据。通过将不同生成任务视为不同形式的局部数据观测,并将其注入生成过程,我们使用单一模型统一了插补和超分辨率等任务,无需重新训练。本模型生成的数据不仅与给定观测保持一致,且保持真实性,相较于针对特定任务设计的插值方法及其他机器学习基线模型表现出更优性能。