We propose an approach utilizing gamma-distributed random variables, coupled with log-Gaussian modeling, to generate synthetic datasets suitable for training neural networks. This addresses the challenge of limited real observations in various applications. We apply this methodology to both Raman and coherent anti-Stokes Raman scattering (CARS) spectra, using experimental spectra to estimate gamma process parameters. Parameter estimation is performed using Markov chain Monte Carlo methods, yielding a full Bayesian posterior distribution for the model which can be sampled for synthetic data generation. Additionally, we model the additive and multiplicative background functions for Raman and CARS with Gaussian processes. We train two Bayesian neural networks to estimate parameters of the gamma process which can then be used to estimate the underlying Raman spectrum and simultaneously provide uncertainty through the estimation of parameters of a probability distribution. We apply the trained Bayesian neural networks to experimental Raman spectra of phthalocyanine blue, aniline black, naphthol red, and red 264 pigments and also to experimental CARS spectra of adenosine phosphate, fructose, glucose, and sucrose. The results agree with deterministic point estimates for the underlying Raman and CARS spectral signatures.
翻译:我们提出了一种利用伽玛分布随机变量并结合对数伽玛建模的方法,以生成适用于训练神经网络的合成数据集。这解决了各类应用中真实观测数据有限的难题。我们将该方法应用于拉曼光谱和相干反斯托克斯拉曼散射(CARS)光谱,并利用实验光谱估计伽玛过程参数。参数估计采用马尔可夫链蒙特卡洛方法,得到模型的完整贝叶斯后验分布,可用于采样生成合成数据。此外,我们采用高斯过程对拉曼和CARS中的加性及乘性背景函数进行建模。我们训练了两个贝叶斯神经网络来估计伽玛过程的参数,这些参数可用于估计底层拉曼光谱,并同时通过概率分布参数的估计提供不确定性。我们将训练好的贝叶斯神经网络应用于酞菁蓝、苯胺黑、萘酚红和红264颜料的实验拉曼光谱,以及腺苷磷酸、果糖、葡萄糖和蔗糖的实验CARS光谱。结果与底层拉曼和CARS光谱特征的确定性点估计一致。