The demand for a huge amount of data for machine learning (ML) applications is currently a bottleneck in an empirically dominated field. We propose a method to combine prior knowledge with data-driven methods to significantly reduce their data dependency. In this study, component-based machine learning (CBML) as the knowledge-encoded data-driven method is examined in the context of energy-efficient building engineering. It encodes the abstraction of building structural knowledge as semantic information in the model organization. We design a case experiment to understand the efficacy of knowledge-encoded ML in sparse data input (1% - 0.0125% sampling rate). The result reveals its three advanced features compared with pure ML methods: 1. Significant improvement in the robustness of ML to extremely small-size and inconsistent datasets; 2. Efficient data utilization from different entities' record collections; 3. Characteristics of accepting incomplete data with high interpretability and reduced training time. All these features provide a promising path to alleviating the deployment bottleneck of data-intensive methods and contribute to efficient real-world data usage. Moreover, four necessary prerequisites are summarized in this study that ensures the target scenario benefits by combining prior knowledge and ML generalization.
翻译:机器学习应用对海量数据的需求目前已成为经验主导领域的瓶颈。本文提出一种将先验知识与数据驱动方法相结合的技术,可显著降低对数据的依赖性。本研究以基于组件的机器学习(CBML)作为知识编码型数据驱动方法,在节能建筑工程领域进行验证。该方法将建筑结构知识的抽象语义信息编码至模型组织中。我们设计案例实验,探究知识编码型机器学习在稀疏数据输入(采样率1%-0.0125%)下的效能。结果表明,与纯机器学习方法相比,该方法具有三大优势:1. 显著提升机器学习对极小规模且不一致数据集的鲁棒性;2. 高效利用不同实体数据集合的数据资源;3. 具备处理不完整数据的能力,兼具高可解释性与缩短的训练时间。这些特性为缓解数据密集型方法的部署瓶颈提供了可行路径,并促进了实际数据的有效利用。此外,本研究总结了四项必要前提条件,确保目标场景通过结合先验知识与机器学习泛化能力而获益。