High-dimensional variable selection, with many more covariates than observations, is widely documented in standard regression models, but there are still few tools to address it in non-linear mixed-effects models where data are collected repeatedly on several individuals. In this work, variable selection is approached from a Bayesian perspective and a selection procedure is proposed, combining the use of a spike-and-slab prior and the SAEM algorithm. Similarly to Lasso regression, the set of relevant covariates is selected by exploring a grid of values for the penalisation parameter. The SAEM approach is much faster than a classical MCMC algorithm and our method shows very good selection performances on simulated data. Its flexibility is demonstrated by implementing it for a variety of nonlinear mixed effects models. The usefulness of the proposed method is illustrated on a problem of genetic markers identification, relevant for genomic-assisted selection in plant breeding.
翻译:高维变量选择(协变量数量远多于观测数量)在标准回归模型中已有广泛研究,但在非线性混合效应模型(其中数据在多个个体上重复收集)中仍缺乏相关工具。本研究从贝叶斯视角处理变量选择问题,提出一种结合spike-and-slab先验与SAEM算法的选择程序。与Lasso回归类似,通过遍历惩罚参数网格值选取相关协变量集。SAEM方法比经典MCMC算法计算速度更快,且该方法在模拟数据上展现出优异的变量选择性能。通过在多种非线性混合效应模型中的实现,验证了其灵活性。最后在植物育种基因组辅助选择相关的遗传标记识别问题中展示了该方法的实用性。