Generating novel, biologically plausible three-dimensional morphological structures is a fundamental challenge in computational evolutionary biology, hampered by extreme data scarcity and the requirement that generated shapes respect phylogenetic relationships among species. In this work, we present PhyloSDF, a phylogenetically-conditioned neural generative model for 3D biological morphology that integrates two innovations: (1) a DeepSDF auto-decoder regularized by a novel Phylogenetic Consistency Loss that structures the latent space to correlate with evolutionary distances (Pearson r=0.993); (2) a Residual Conditional Flow Matching (Residual CFM) architecture that factorizes generation into analytic species-centroid lookup and learned residual prediction, enabling generation from as few as ~4 specimens per species. We evaluate PhyloSDF on 100 micro-CT-scanned skulls of Darwin's Finches and their relatives across 24 species. The model generates novel meshes achieving 88-129% of real intra-species variation at the code level, with all 180 generated meshes verified as non-memorized. Residual CFM surpasses denoising diffusion (which fails entirely at this scale), standard flow matching (which mode-collapses to 3-6% variation), and a Gaussian mixture baseline in both fidelity (Chamfer Distance 0.00181 vs. 0.00190) and morphometric Fréchet distance (10,641 vs. 13,322). Leave-one-species-out experiments across 18 species demonstrate phylogenetic extrapolation capability, and smooth latent interpolations produce biologically plausible ancestral skull reconstructions.
翻译:摘要:生成新颖且生物合理的三维形态结构是计算进化生物学中的基本挑战,其困难在于数据极度稀缺,且生成的形状必须尊重物种间的系统发育关系。本文提出PhyloSDF,一种基于系统发育条件的神经生成模型,用于三维生物形态生成,该模型整合了两项创新:(1) 一种受新型系统发育一致性损失正则化的DeepSDF自动解码器,该损失函数使潜在空间结构化,从而与进化距离相关(Pearson r=0.993);(2) 一种残差条件流匹配(Residual CFM)架构,该架构将生成过程分解为物种质心解析查询与学习残差预测,从而仅需每个物种约4个样本即可实现生成。我们在达芬奇雀及其近亲共24个物种的100个微型CT扫描颅骨上评估了PhyloSDF。该模型生成的网格在编码层面实现了真实种内变异的88-129%,且所有180个生成网格均被验证为非记忆性生成。残差条件流匹配在保真度(Chamfer距离0.00181 vs. 0.00190)和形态学Fréchet距离(10,641 vs. 13,322)上均优于去噪扩散模型(在该尺度下完全失效)、标准流匹配模型(模式坍缩至3-6%变异)及高斯混合基线。跨18个物种的留一物种实验证明了系统发育外推能力,且平滑的潜在插值产生了生物合理的祖先颅骨重建。