Morphological traits provide important evidence for phylogenetic reconstruction and evolutionary relationship analysis. Recent image-based approaches have introduced deep learning, particularly convolutional models, to derive morphological features from specimen images, but these methods generally rely on single-modality visual representations and do not explicitly incorporate morphological semantics. This study proposes a morphology-aware multimodal alignment framework for insect phylogenetic reconstruction. The framework combines specimen images with curated morphological descriptions by adapting a vision transformer through parameter-efficient fine-tuning and supervised contrastive learning, followed by image-text alignment in a shared latent space. The learned image embeddings are then used as continuous traits for Bayesian phylogenetic reconstruction. On the public Rove-Tree-11 dataset, comparative and ablation experiments across multiple visual backbones and feature adaptation strategies demonstrate that multimodal alignment improves topological agreement with the reference phylogeny. The results indicate that the proposed framework can derive morphology-aware visual traits for computational phylogenetic reconstruction.
翻译:暂无翻译