基于深度学习的全表型数据揭示鸟类视觉差异的爆发式演化 (Deep-learning-based pan-phenomic data reveals the explosive evolution of avian visual disparity)

from arxiv, Readers from the field of computer science may be interested in section 2.1, 2.2, 3.1, 4.1, 4.2. These sections discussed the interpretability and representation learning, especially the texture vs shape problem, highlighting our model's ability of overcoming the texture biases and capturing overall shape features. (Although they're put here to prove the biological validity of the model.)

The evolution of biological morphology is critical for understanding the diversity of the natural world, yet traditional analyses often involve subjective biases in the selection and coding of morphological traits. This study employs deep learning techniques, utilising a ResNet34 model capable of recognising over 10,000 bird species, to explore avian morphological evolution. We extract weights from the model's final fully connected (fc) layer and investigate the semantic alignment between the high-dimensional embedding space learned by the model and biological phenotypes. The results demonstrate that the high-dimensional embedding space encodes phenotypic convergence. Subsequently, we assess the morphological disparity among various taxa and evaluate the association between morphological disparity and species richness, demonstrating that species richness is the primary driver of morphospace expansion. Moreover, the disparity-through-time analysis reveals a visual "early burst" after the K-Pg extinction. While mainly aimed at evolutionary analysis, this study also provides insights into the interpretability of Deep Neural Networks. We demonstrate that hierarchical semantic structures (biological taxonomy) emerged in the high-dimensional embedding space despite being trained on flat labels. Furthermore, through adversarial examples, we provide evidence that our model in this task can overcome texture bias and learn holistic shape representations (body plans), challenging the prevailing view that CNNs rely primarily on local textures.

翻译：生物形态的演化对于理解自然界的多样性至关重要，然而传统分析在形态性状的选择和编码上常存在主观偏差。本研究采用深度学习技术，利用一个能够识别超过10,000种鸟类的ResNet34模型，来探索鸟类的形态演化。我们从模型的最终全连接（fc）层提取权重，并探究模型学习到的高维嵌入空间与生物表型之间的语义对齐关系。结果表明，该高维嵌入空间编码了表型的趋同演化。随后，我们评估了不同分类群之间的形态差异，并检验了形态差异与物种丰富度之间的关联，证明物种丰富度是形态空间扩张的主要驱动力。此外，基于时间序列的差异分析揭示了在K-Pg大灭绝事件之后出现了一次视觉上的“早期爆发”。虽然本研究主要旨在进行演化分析，但它也为深度神经网络的解释性提供了见解。我们证明，尽管模型是在扁平标签上训练的，但层次化的语义结构（生物分类学）在高维嵌入空间中自发涌现。进一步地，通过对抗样本，我们提供了证据表明，本任务中的模型能够克服纹理偏差并学习整体形状表征（身体构型），这对当前普遍认为卷积神经网络主要依赖局部纹理的观点提出了挑战。