Cultivation experiments often produce sparse and irregular time series. Classical approaches based on mechanistic models, like Maximum Likelihood fitting or Monte-Carlo Markov chain sampling, can easily account for sparsity and time-grid irregularities, but most statistical and Machine Learning tools are not designed for handling sparse data out-of-the-box. Among popular approaches there are various schemes for filling missing values (imputation) and interpolation into a regular grid (alignment). However, such methods transfer the biases of the interpolation or imputation models to the target model. We show that Deep Set Neural Networks equipped with triplet encoding of the input data can successfully handle bio-process data without any need for imputation or alignment procedures. The method is agnostic to the particular nature of the time series and can be adapted for any task, for example, online monitoring, predictive control, design of experiments, etc. In this work, we focus on forecasting. We argue that such an approach is especially suitable for typical cultivation processes, demonstrate the performance of the method on several forecasting tasks using data generated from macrokinetic growth models under realistic conditions, and compare the method to a conventional fitting procedure and methods based on imputation and alignment.
翻译:培养实验通常生成稀疏且不规则的时间序列。基于机械模型(如最大似然拟合或蒙特卡洛马尔可夫链采样)的经典方法能轻易处理稀疏性与时间网格不规则性,但大多数统计学和机器学习工具并非为直接处理稀疏数据而设计。常见方案包括填补缺失值(插补)和将数据对齐到规则网格(对齐),但这些方法会将插补或对齐模型的偏差传递至目标模型。我们证明,采用三重编码输入数据的深集合神经网络可在无需插补或对齐流程的情况下成功处理生物过程数据。该方法与时间序列的具体性质无关,可适应任何任务(如在线监测、预测控制、实验设计等)。本研究聚焦于预测任务,论证了该方法特别适用于典型培养过程,并基于真实条件下宏观动力学生长模型生成的数据,在多个预测任务上展示了其性能,同时与常规拟合流程及基于插补/对齐的方法进行了对比。