A Bayesian Updating Framework for Long-term Multi-Environment Trial Data in Plant Breeding

In variety testing, multi-environment trials (MET) are essential for evaluating the genotypic performance of crop plants. A persistent challenge in the statistical analysis of MET data is the estimation of variance components, which are often still inaccurately estimated or shrunk to exactly zero when using residual (restricted) maximum likelihood (REML) approaches. At the same time, institutions conducting MET typically possess extensive historical data that can, in principle, be leveraged to improve variance component estimation. However, these data are rarely incorporated sufficiently. The purpose of this paper is to address this gap by proposing a Bayesian framework that systematically integrates historical information to stabilize variance component estimation and better quantify uncertainty. Our Bayesian linear mixed model (BLMM) reformulation uses priors and Markov chain Monte Carlo (MCMC) methods to maintain the variance components as positive, yielding more realistic distributional estimates. Furthermore, our model incorporates historical prior information by managing MET data in successive historical data windows. Variance component prior and posterior distributions are shown to be conjugate and belong to the inverse gamma and inverse Wishart families. While Bayesian methodology is increasingly being used for analyzing MET data, to the best of our knowledge, this study comprises one of the first serious attempts to objectively inform priors in the context of MET data. This refers to the proposed Bayesian updating approach. To demonstrate the framework, we consider an application where posterior variance component samples are plugged into an A-optimality experimental design criterion to determine the average optimal allocations of trials to agro-ecological zones in a sub-divided target population of environments (TPE).

翻译：在品种测试中，多点试验对于评估作物基因型表现至关重要。多点试验数据统计分析面临的一个持续挑战是方差分量的估计问题。使用残差最大似然方法时，方差分量常被不准确估计或直接收缩为零。与此同时，开展多点试验的机构通常拥有大量可被用于改进方差分量估计的历史数据，但这些数据很少被充分利用。本文旨在通过提出一个系统整合历史信息以稳定方差分量估计并更好量化不确定性的贝叶斯框架来填补这一空白。我们提出的贝叶斯线性混合模型通过引入先验与马尔可夫链蒙特卡洛方法，确保方差分量保持正值，从而获得更符合实际的分布估计。此外，该模型通过管理连续历史数据窗口中的多点试验数据来整合历史先验信息，并证明方差分量的先验与后验分布具有共轭性，分别属于逆伽马分布和逆威沙特分布族。尽管贝叶斯方法正越来越多地用于分析多点试验数据，但据我们所知，本研究是首次系统性地尝试在多点试验数据背景下客观构建先验分布，即提出了贝叶斯更新方法。为展示该框架，我们将其应用于一个实例：通过将后验方差分量样本输入到A-最优实验设计准则中，确定在细分目标环境群体中各农业生态区的最优平均试验分配方案。