High-dimensional spectral data -- routinely generated in dairy production -- are used to predict a range of traits in milk products. Partial least squares regression (PLSR) is ubiquitously used for these prediction tasks. However PLSR is not typically viewed as arising from statistical inference of a probabilistic model, and parameter uncertainty is rarely quantified. Additionally, PLSR does not easily lend itself to model-based modifications, coherent prediction intervals are not readily available, and the process of choosing the latent-space dimension, $\mathtt{Q}$, can be subjective and sensitive to data size. We introduce a Bayesian latent-variable model, emulating the desirable properties of PLSR while accounting for parameter uncertainty. The need to choose $\mathtt{Q}$ is eschewed through a nonparametric shrinkage prior. The flexibility of the proposed Bayesian partial least squares regression (BPLSR) framework is exemplified by considering sparsity modifications and allowing for multivariate response prediction. The BPLSR framework is used in two motivating settings: 1) trait prediction from mid-infrared spectral analyses of milk samples, and 2) milk pH prediction from surface-enhanced Raman spectral data. The prediction performance of BPLSR at least matches that of PLSR. Additionally, the provision of correctly calibrated prediction intervals objectively provides richer, more informative inference for stakeholders in dairy production.
翻译:高维光谱数据——在乳制品生产中常规生成——被用于预测牛奶产品的一系列性状。偏最小二乘回归(PLSR)广泛用于这些预测任务。然而,PLSR通常不被视为源于概率模型的统计推断,参数不确定性也很少被量化。此外,PLSR难以直接进行基于模型的改进,不易获得一致的预测区间,且选择潜在空间维度$\mathtt{Q}$的过程可能具有主观性并对数据规模敏感。我们引入一种贝叶斯潜在变量模型,在保留PLSR所需特性的同时考虑参数不确定性。通过非参数收缩先验,避免了选择$\mathtt{Q}$的需求。所提出的贝叶斯偏最小二乘回归(BPLSR)框架的灵活性通过考虑稀疏性修正和允许多变量响应预测得以体现。BPLSR框架在两个激励性场景中应用:1)基于牛奶样品中红外光谱分析的性状预测,以及2)基于表面增强拉曼光谱数据的牛奶pH值预测。BPLSR的预测性能至少与PLSR相当。此外,提供正确校准的预测区间客观上为乳制品生产中的利益相关者提供了更丰富、更信息丰富的推断。