Machine-learning techniques, especially deep convolutional neural networks, are pivotal for image-based identification of biological species in many Citizen Science platforms. In this paper, we describe the construction of a dataset for the Portuguese native flora based on publicly available research-grade datasets, and the derivation of a high-accuracy model from it using off-the-shelf deep convolutional neural networks. We anchored the dataset in high-quality data provided by Sociedade Portuguesa de Botânica and added further sampled data from research-grade datasets available from GBIF. We find that with a careful dataset design, off-the-shelf machine-learning cloud services such as Google's AutoML Vision produce accurate models, with results comparable to those of Pl@ntNet, a state-of-the-art citizen science platform. The best model we derived, dubbed Floralens, has been integrated into the public website of Project Biolens, where we gather models for other taxa as well. The dataset used to train the model is also publicly available on Zenodo.
翻译:机器学习技术,尤其是深度卷积神经网络,在许多公民科学平台中基于图像识别生物物种方面起着关键作用。本文描述了基于公开研究级数据集构建葡萄牙本土植物群数据集的过程,并利用现成的深度卷积神经网络从中推导出高精度模型。我们以葡萄牙植物学会提供的高质量数据为核心数据集,并从全球生物多样性信息机构(GBIF)提供的研究级数据集中补充了更多采样数据。研究发现,通过合理的数据集设计,诸如谷歌AutoML Vision等现成机器学习云服务平台即可生成精确模型,其效果可与当今领先的公民科学平台Pl@ntNet相媲美。我们推导出的最佳模型名为Floralens,已集成到Project Biolens的公共网站中,该网站还收集了其他类群的模型。用于训练模型的数据集也已在Zenodo上公开发布。