BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting

from arxiv, NeurIPS 2023 Datasets & Benchmarks Track camera-ready version. 35 pages. Code available at https://github.com/NREL/BuildingsBench/ and data available at https://data.openei.org/submissions/5859

Short-term forecasting of residential and commercial building energy consumption is widely used in power systems and continues to grow in importance. Data-driven short-term load forecasting (STLF), although promising, has suffered from a lack of open, large-scale datasets with high building diversity. This has hindered exploring the pretrain-then-fine-tune paradigm for STLF. To help address this, we present BuildingsBench, which consists of: 1) Buildings-900K, a large-scale dataset of 900K simulated buildings representing the U.S. building stock; and 2) an evaluation platform with over 1,900 real residential and commercial buildings from 7 open datasets. BuildingsBench benchmarks two under-explored tasks: zero-shot STLF, where a pretrained model is evaluated on unseen buildings without fine-tuning, and transfer learning, where a pretrained model is fine-tuned on a target building. The main finding of our benchmark analysis is that synthetically pretrained models generalize surprisingly well to real commercial buildings. An exploration of the effect of increasing dataset size and diversity on zero-shot commercial building performance reveals a power-law with diminishing returns. We also show that fine-tuning pretrained models on real commercial and residential buildings improves performance for a majority of target buildings. We hope that BuildingsBench encourages and facilitates future research on generalizable STLF. All datasets and code can be accessed from https://github.com/NREL/BuildingsBench.

翻译：住宅和商业建筑能耗的短期预测在电力系统中广泛应用且重要性持续增长。基于数据驱动的短期负荷预测（STLF）虽前景广阔，但长期受限于缺乏兼具建筑多样性的开放大规模数据集，这阻碍了预测-微调范式在STLF领域的探索。为应对这一挑战，我们提出BuildingsBench，其包含：1) Buildings-900K —— 一个包含90万栋模拟建筑的大规模数据集，代表美国建筑存量；2) 一个评估平台，涵盖来自7个开放数据集的1900余栋真实住宅和商业建筑。BuildingsBench针对两类探索不足的任务进行基准测试：零样本STLF（预训练模型在不经微调的情况下对未见建筑进行预测）和迁移学习（预训练模型在目标建筑上进行微调）。基准分析的核心发现是：合成数据预训练模型对真实商业建筑的泛化能力惊人。通过探索数据集规模与多样性对零样本商业建筑性能的影响，我们发现其呈现幂律衰减效应。研究同时表明，对真实商业和住宅建筑进行预训练模型微调可提升大部分目标建筑的预测性能。我们期待BuildingsBench能推动可泛化STLF的未来研究。所有数据集和代码可通过 https://github.com/NREL/BuildingsBench 获取。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日