Optimization of breeding program design through stochastic simulation with evolutionary algorithms

The effective planning and allocation of resources in modern breeding programs is a complex task. Breeding program design and operational management have a major impact on the success of a breeding program and changing parameters such as the number of selected/phenotyped/genotyped individuals will impact genetic gain, genetic diversity, and costs. As a result, careful assessment and balancing of design parameters is crucial, considering the trade-offs between different breeding goals and associated costs. In a previous study, we optimized the resource allocation strategy in a dairy cattle breeding scheme via the combination of stochastic simulations and kernel regression, aiming to maximize a target function containing genetic gain and the inbreeding rate under a given budget. However, the high number of simulations required when using the proposed kernel regression method to optimize a breeding program with many parameters weakens the effectiveness of such a method. In this work, we are proposing an optimization framework that builds on the concepts of kernel regression but additionally makes use of an evolutionary algorithm to allow for a more effective and general optimization. The key idea is to consider a set of potential parameterizations of the breeding program, evaluate their performance based on stochastic simulations, and use these outputs to derive new parametrization to test in an iterative procedure. The evolutionary algorithm was implemented in a Snakemake pipeline to allow for efficient scaling on large distributed computing platforms. The algorithm achieved convergence to the same optimum with a massively reduced number of simulations. Thereby, the incorporation of class variables and accounting for a higher number of parameters in the optimization pipeline leads to substantially reduced computing time and better scaling for the desired optimization of a breeding program.

翻译：现代育种计划中资源的有效规划与分配是一项复杂的任务。育种程序设计与运营管理对育种计划的成功具有重大影响，改变诸如选择/表型测定/基因型测定个体数量等参数将影响遗传进展、遗传多样性和成本。因此，在设计参数之间进行审慎评估与平衡至关重要，这需要综合考虑不同育种目标及相关成本之间的权衡关系。在先前的研究中，我们通过随机模拟与核回归相结合的方法优化了奶牛育种方案中的资源分配策略，旨在既定预算下最大化包含遗传进展和近交率的目标函数。然而，当使用所提出的核回归方法优化具有多参数的育种程序时，所需的大量模拟次数削弱了该方法的有效性。本研究提出了一种优化框架，该框架基于核回归概念构建，并额外利用进化算法以实现更高效、更通用的优化。其核心思想是：考虑一组育种程序的潜在参数化方案，基于随机模拟评估其性能，并利用这些输出结果推导出新的参数化方案，在迭代过程中进行测试。该进化算法在Snakemake流程中实现，以便在大型分布式计算平台上高效扩展。该算法以大幅减少的模拟次数实现了向相同最优解的收敛。因此，在优化流程中纳入类别变量并考虑更多参数，显著减少了计算时间，并为育种程序的预期优化提供了更好的可扩展性。