Floralens: a Deep Learning Model for the Portuguese Native Flora

Machine-learning techniques, especially deep convolutional neural networks, are pivotal for image-based identification of biological species in many Citizen Science platforms. In this paper, we describe the construction of a dataset for the Portuguese native flora based on publicly available research-grade datasets, and the derivation of a high-accuracy model from it using off-the-shelf deep convolutional neural networks. We anchored the dataset in high-quality data provided by Sociedade Portuguesa de Botânica and added further sampled data from research-grade datasets available from GBIF. We find that with a careful dataset design, off-the-shelf machine-learning cloud services such as Google's AutoML Vision produce accurate models, with results comparable to those of Pl@ntNet, a state-of-the-art citizen science platform. The best model we derived, dubbed Floralens, has been integrated into the public website of Project Biolens, where we gather models for other taxa as well. The dataset used to train the model is also publicly available on Zenodo.

翻译：机器学习技术，尤其是深度卷积神经网络，在许多公民科学平台中基于图像识别生物物种方面起着关键作用。本文描述了基于公开研究级数据集构建葡萄牙本土植物群数据集的过程，并利用现成的深度卷积神经网络从中推导出高精度模型。我们以葡萄牙植物学会提供的高质量数据为核心数据集，并从全球生物多样性信息机构（GBIF）提供的研究级数据集中补充了更多采样数据。研究发现，通过合理的数据集设计，诸如谷歌AutoML Vision等现成机器学习云服务平台即可生成精确模型，其效果可与当今领先的公民科学平台Pl@ntNet相媲美。我们推导出的最佳模型名为Floralens，已集成到Project Biolens的公共网站中，该网站还收集了其他类群的模型。用于训练模型的数据集也已在Zenodo上公开发布。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

深度学习在农业领域的研究与应用

专知会员服务

23+阅读 · 2024年5月3日

【NUS博士论文】深度表示学习的视频基础模型，236页pdf

专知会员服务

33+阅读 · 2023年12月26日

【牛津大学博士论文】学习和解释来自多模态数据的深度表示，267页pdf

专知会员服务

84+阅读 · 2022年10月30日

【深度迁移学习在图像分类中的应用综述】Deep transfer learning for image classification: a survey

专知会员服务

25+阅读 · 2022年5月24日