Learning processes by exploiting restricted domain knowledge is an important task across a plethora of scientific areas, with more and more hybrid methods combining data-driven and model-based approaches. However, while such hybrid methods have been tested in various scientific applications, they have been mostly tested on dynamical systems, with only limited study about the influence of each model component on global performance and parameter identification. In this work, we assess the performance of hybrid modeling against traditional machine learning methods on standard regression problems. We compare, on both synthetic and real regression problems, several approaches for training such hybrid models. We focus on hybrid methods that additively combine a parametric physical term with a machine learning term and investigate model-agnostic training procedures. We also introduce a new hybrid approach based on partial dependence functions. Experiments are carried out with different types of machine learning models, including tree-based models and artificial neural networks.
翻译:利用受限领域知识进行学习的过程在众多科学领域中是一项重要任务,越来越多结合数据驱动与模型驱动的混合方法被提出。然而,尽管此类混合方法已在各种科学应用中得到验证,但其主要测试对象为动态系统,关于各模型组件对整体性能及参数识别的影响研究仍十分有限。本研究评估了混合建模在标准回归问题中相较于传统机器学习方法的性能。我们通过合成数据与真实数据回归问题,对比了训练此类混合模型的多种方法。重点研究了将参数化物理项与机器学习项以加性方式结合的混合方法,并探讨了模型无关的训练流程。此外,我们提出了一种基于部分依赖函数的新型混合方法。实验涵盖了不同类型的机器学习模型,包括基于树的模型与人工神经网络。