Filling time-series gaps using image techniques: Multidimensional context autoencoder approach for building energy data imputation

Building energy prediction and management has become increasingly important in recent decades, driven by the growth of Internet of Things (IoT) devices and the availability of more energy data. However, energy data is often collected from multiple sources and can be incomplete or inconsistent, which can hinder accurate predictions and management of energy systems and limit the usefulness of the data for decision-making and research. To address this issue, past studies have focused on imputing missing gaps in energy data, including random and continuous gaps. One of the main challenges in this area is the lack of validation on a benchmark dataset with various building and meter types, making it difficult to accurately evaluate the performance of different imputation methods. Another challenge is the lack of application of state-of-the-art imputation methods for missing gaps in energy data. Contemporary image-inpainting methods, such as Partial Convolution (PConv), have been widely used in the computer vision domain and have demonstrated their effectiveness in dealing with complex missing patterns. To study whether energy data imputation can benefit from the image-based deep learning method, this study compared PConv, Convolutional neural networks (CNNs), and weekly persistence method using one of the biggest publicly available whole building energy datasets, consisting of 1479 power meters worldwide, as the benchmark. The results show that, compared to the CNN with the raw time series (1D-CNN) and the weekly persistence method, neural network models with reshaped energy data with two dimensions reduced the Mean Squared Error (MSE) by 10% to 30%. The advanced deep learning method, Partial convolution (PConv), has further reduced the MSE by 20-30% than 2D-CNN and stands out among all models.

翻译：近年来，随着物联网设备的普及和能源数据可用性的提升，建筑能耗预测与管理的重要性日益凸显。然而，能源数据常采集自多源系统，易出现不完整或不一致的问题，这不仅会阻碍能源系统的精准预测与管理，还会限制数据在决策和研究中的实用性。针对这一挑战，过往研究主要聚焦于能源数据中随机缺失与连续缺失的插补。当前领域的主要难点在于：缺乏基于包含多种建筑类型与计量类型的基准数据集的验证，导致难以准确评估不同插补方法的性能；同时，最先进的插补方法在能源数据缺失场景中的应用尚不充分。现代图像修复方法（如部分卷积）已在计算机视觉领域得到广泛应用，并展现出对复杂缺失模式的处理能力。为探究基于图像的深度学习方法能否提升能源数据插补效果，本研究以全球最大的公开建筑整体能耗数据集（含1479个功率计量点）为基准，对比了部分卷积（PConv）、卷积神经网络（CNN）与周持续性方法的性能。结果表明：相较于基于原始一维时间序列的CNN和周持续性方法，将能耗数据重构为二维输入后，神经网络模型将均方误差降低了10%-30%；而采用先进深度学习方法的PConv较二维CNN进一步降低20%-30%的均方误差，在所有模型中表现最优。