Deep Set Neural Networks for forecasting asynchronous bioprocess timeseries

Cultivation experiments often produce sparse and irregular time series. Classical approaches based on mechanistic models, like Maximum Likelihood fitting or Monte-Carlo Markov chain sampling, can easily account for sparsity and time-grid irregularities, but most statistical and Machine Learning tools are not designed for handling sparse data out-of-the-box. Among popular approaches there are various schemes for filling missing values (imputation) and interpolation into a regular grid (alignment). However, such methods transfer the biases of the interpolation or imputation models to the target model. We show that Deep Set Neural Networks equipped with triplet encoding of the input data can successfully handle bio-process data without any need for imputation or alignment procedures. The method is agnostic to the particular nature of the time series and can be adapted for any task, for example, online monitoring, predictive control, design of experiments, etc. In this work, we focus on forecasting. We argue that such an approach is especially suitable for typical cultivation processes, demonstrate the performance of the method on several forecasting tasks using data generated from macrokinetic growth models under realistic conditions, and compare the method to a conventional fitting procedure and methods based on imputation and alignment.

翻译：培养实验通常生成稀疏且不规则的时间序列。基于机械模型（如最大似然拟合或蒙特卡洛马尔可夫链采样）的经典方法能轻易处理稀疏性与时间网格不规则性，但大多数统计学和机器学习工具并非为直接处理稀疏数据而设计。常见方案包括填补缺失值（插补）和将数据对齐到规则网格（对齐），但这些方法会将插补或对齐模型的偏差传递至目标模型。我们证明，采用三重编码输入数据的深集合神经网络可在无需插补或对齐流程的情况下成功处理生物过程数据。该方法与时间序列的具体性质无关，可适应任何任务（如在线监测、预测控制、实验设计等）。本研究聚焦于预测任务，论证了该方法特别适用于典型培养过程，并基于真实条件下宏观动力学生长模型生成的数据，在多个预测任务上展示了其性能，同时与常规拟合流程及基于插补/对齐的方法进行了对比。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日