Forecasting indoor temperatures is important to achieve efficient control of HVAC systems. In this task, the limited data availability presents a challenge as most of the available data is acquired during standard operation where extreme scenarios and transitory regimes such as major temperature increases or decreases are de-facto excluded. Acquisition of such data requires significant energy consumption and a dedicated facility, hindering the quantity and diversity of available data. Cost related constraints however do not allow for continuous year-around acquisition. To address this, we investigate the efficacy of data augmentation techniques leveraging SoTA AI-based methods for synthetic data generation. Inspired by practical and experimental motivations, we explore fusion strategies of real and synthetic data to improve forecasting models. This approach alleviates the need for continuously acquiring extensive time series data, especially in contexts involving repetitive heating and cooling cycles in buildings. In our evaluation 1) we assess the performance of synthetic data generators independently, particularly focusing on SoTA AI-based methods; 2) we measure the utility of incorporating synthetically augmented data in a subsequent forecasting tasks where we employ a simple model in two distinct scenarios: 1) we first examine an augmentation technique that combines real and synthetically generated data to expand the training dataset, 2) we delve into utilizing synthetic data to tackle dataset imbalances. Our results highlight the potential of synthetic data augmentation in enhancing forecasting accuracy while mitigating training variance. Through empirical experiments, we show significant improvements achievable by integrating synthetic data, thereby paving the way for more robust forecasting models in low-data regime.
翻译:室内温度预测对于实现暖通空调系统的高效控制至关重要。在此任务中,有限的数据可用性构成了挑战,因为大多数可用数据是在标准运行期间获取的,其中极端场景和瞬态工况(如大幅升温或降温)实际上被排除在外。获取此类数据需要大量能耗和专用设施,限制了可用数据的数量和多样性。然而,成本相关的约束不允许进行全年连续采集。为解决此问题,我们研究了利用基于最先进人工智能方法的合成数据生成技术进行数据增强的效果。受实践和实验动机的启发,我们探索了真实数据与合成数据的融合策略,以改进预测模型。这种方法减轻了持续获取大量时间序列数据的需求,特别是在涉及建筑重复加热和冷却循环的场景中。在我们的评估中:1)我们独立评估了合成数据生成器的性能,特别关注基于最先进人工智能的方法;2)我们衡量了将合成增强数据纳入后续预测任务中的效用,其中我们在两种不同场景中采用了一个简单模型:首先,我们研究了一种结合真实数据和合成生成数据以扩展训练数据集的数据增强技术;其次,我们深入探讨了利用合成数据解决数据集不平衡问题。我们的结果突显了合成数据增强在提高预测准确性和降低训练方差方面的潜力。通过实证实验,我们展示了集成合成数据可实现的显著改进,从而为在低数据环境下构建更稳健的预测模型铺平了道路。