All fields of knowledge are being impacted by Artificial Intelligence. In particular, the Deep Learning paradigm enables the development of data analysis tools that support subject matter experts in a variety of sectors, from physics up to the recognition of ancient languages. Palaeontology is now observing this trend as well. This study explores the capability of Convolutional Neural Networks (CNNs), a particular class of Deep Learning algorithms specifically crafted for computer vision tasks, to classify images of isolated fossil shark teeth gathered from online datasets as well as from the authors$'$ experience on Peruvian Miocene and Italian Pliocene fossil assemblages. The shark taxa that are included in the final, composite dataset (which consists of more than one thousand images) are representative of both extinct and extant genera, namely, Carcharhinus, Carcharias, Carcharocles, Chlamydoselachus, Cosmopolitodus, Galeocerdo, Hemipristis, Notorynchus, Prionace and Squatina. We developed a CNN, named SharkNet-X, specifically tailored on our recognition task, reaching a 5-fold cross validated mean accuracy of 0.85 to identify images containing a single shark tooth. Furthermore, we elaborated a visualization of the features extracted from images using the last dense layer of the CNN, achieved through the application of the clustering technique t-SNE. In addition, in order to understand and explain the behaviour of the CNN while giving a paleontological point of view on the results, we introduced the explainability method SHAP. To the best of our knowledge, this is the first instance in which this method is applied to the field of palaeontology. The main goal of this work is to showcase how Deep Learning techniques can aid in identifying isolated fossil shark teeth, paving the way for developing new information tools for automating the recognition and classification of fossils.
翻译:所有知识领域都正受到人工智能的影响。特别是,深度学习范式使得开发数据分析工具成为可能,这些工具能够支持从物理学到古语言识别等各领域的专家。古生物学如今也正见证这一趋势。本研究探索了卷积神经网络(CNN)——一种专为计算机视觉任务设计的深度学习算法——对孤立化石鲨鱼牙齿图像进行分类的能力,这些图像来自在线数据集以及作者在秘鲁中新世和意大利上新世化石组合中的经验。最终复合数据集(包含超过一千张图像)中的鲨鱼分类单元涵盖了灭绝和现存的属,包括真鲨属、锥齿鲨属、巨齿鲨属、皱鳃鲨属、宽齿鲨属、鼬鲨属、半锯鲨属、七鳃鲨属、大青鲨属和扁鲨属。我们开发了一个名为SharkNet-X的CNN,专门为我们的识别任务定制,在包含单颗鲨鱼牙齿的图像识别中,5折交叉验证的平均准确率达到0.85。此外,我们通过使用CNN的最后一个密集层,应用t-SNE聚类技术,对从图像中提取的特征进行了可视化。同时,为了从古生物学角度理解和解释CNN的行为及结果,我们引入了可解释性方法SHAP。据我们所知,这是该方法首次应用于古生物学领域。本研究的主要目标是展示深度学习技术如何帮助识别孤立化石鲨鱼牙齿,为开发自动化化石识别与分类的新型信息工具铺平道路。