A data-driven framework is presented, that enables the prediction of quantities, either observations or parameters, given sufficient partial data. The framework is illustrated via a computational model of the deposition of Cu in a Chemical Vapor Deposition (CVD) reactor, where the reactor pressure, the deposition temperature and feed mass flow rate are important process parameters that determine the outcome of the process. The sampled observations are high-dimensional vectors containing the outputs of a detailed CFD steady-state model of the process, i.e. the values of velocity, pressure, temperature, and species mass fractions at each point in the discretization. A machine learning workflow is presented, able to predict out-of-sample (a) observations (e.g. mass fraction in the reactor) given process parameters (e.g. inlet temperature); (b) process parameters given observation data; and (c) partial observations (e.g. temperature in the reactor) given other partial observations (e.g. mass fraction in the reactor). The proposed workflow relies on the manifold learning schemes Diffusion Maps and the associated Geometric Harmonics. Diffusion Maps is used for discovering a reduced representation of the available data, and Geometric Harmonics for extending functions defined on the manifold. In our work a special use case of Geometric Harmonics is formulated and implemented, which we call Double Diffusion Maps, to map from the reduced representation back to (partial) observations and process parameters. A comparison of our manifold learning scheme to the traditional Gappy-POD approach is provided: ours can be thought of as a "Gappy DMAP" approach. The presented methodology is easily transferable to application domains beyond reactor engineering.
翻译:本文提出了一种数据驱动框架,能够在获取足够部分数据的前提下预测观测值或参数等量。该框架通过化学气相沉积反应器中铜沉积的计算模型进行验证,其中反应器压力、沉积温度和进料质量流量是决定工艺结果的关键过程参数。采样观测值为高维向量,包含该过程的详细稳态计算流体动力学模型输出结果,即离散化各点的速度、压力、温度及组分质量分数。本文提出了一套机器学习工作流,可实现以下样本外预测:(a) 根据过程参数(如入口温度)预测观测值(如反应器内质量分数);(b) 根据观测数据预测过程参数;(c) 根据部分观测值(如反应器内温度)预测其他部分观测值(如反应器内质量分数)。该工作流基于流形学习方案:扩散映射及其关联的几何谐波。扩散映射用于发现数据的降维表征,几何谐波则用于延拓定义在流形上的函数。研究中我们提出并实现了一种几何谐波的特殊用例——双扩散映射,用于从降维表征反向映射至(部分)观测值与过程参数。通过将所提流形学习方案与传统Gappy-POD方法对比,可将其视为一种"Gappy DMAP"方法。本文提出的方法论可便捷推广至反应器工程以外的应用领域。