An emerging application of Raman spectroscopy is monitoring the state of chemical reactors during biologic drug production. Raman shift intensities scale linearly with the concentrations of chemical species and thus can be used to analytically determine real-time concentrations using non-destructive light irradiation in a label-free manner. Chemometric algorithms are used to interpret Raman spectra produced from complex mixtures of bioreactor contents as a reaction evolves. Finding the optimal algorithm for a specific bioreactor environment is challenging due to the lack of freely available Raman mixture datasets. The RaMix Python package addresses this challenge by enabling the generation of synthetic Raman mixture datasets with controllable noise levels to assess the utility of different chemometric algorithm types for real-time monitoring applications. To demonstrate the capabilities of this package and compare the performance of different chemometric algorithms, 48 datasets of simulated spectra were generated using the RaMix Python package. The four tested algorithms include partial least squares regression (PLS), a simple neural network, a simple convolutional neural network (simple CNN), and a 1D convolutional neural network with a ResNet architecture (ResNet). The performance of the PLS and simple CNN model was found to be comparable, with the PLS algorithm slightly outperforming the other models on 83\% of the data sets. The simple CNN model outperforms the other models on large, high noise datasets, demonstrating the superior capability of convolutional neural networks compared to PLS in analyzing noisy spectra. These results demonstrate the promise of CNNs to automatically extract concentration information from unprocessed, noisy spectra, allowing for better process control of industrial drug production. Code for this project is available at github.com/DexterAntonio/RaMix.
翻译:拉曼光谱学的新兴应用是在生物制药生产过程中监测化学反应器的状态。拉曼位移强度与化学物种的浓度呈线性关系,因此可利用无标记、非破坏性的光照射来实时分析确定浓度。化学计量学算法用于解析生物反应器复杂混合物的拉曼光谱,以追踪反应的演变过程。由于缺乏可自由获取的拉曼混合物数据集,为特定生物反应器环境找到最优算法具有挑战性。RaMix Python包通过生成噪声水平可控的合成拉曼混合物数据集,评估不同化学计量学算法在实时监测应用中的效用,从而应对这一挑战。为展示该包的能力并比较不同化学计量学算法的性能,我们使用RaMix Python包生成了48个模拟光谱数据集。四种测试算法包括偏最小二乘回归(PLS)、简单神经网络、简单卷积神经网络(简单CNN)以及采用ResNet架构的一维卷积神经网络(ResNet)。结果表明,PLS和简单CNN模型的性能相当,PLS算法在83%的数据集上略微优于其他模型。简单CNN模型在大型高噪声数据集上表现最佳,证明了卷积神经网络在分析噪声光谱方面相较于PLS的优越能力。这些结果展示了CNN从未经处理的噪声光谱中自动提取浓度信息的潜力,从而实现对工业药物生产的更优过程控制。本项目代码可见于github.com/DexterAntonio/RaMix。