Planktonzilla: Multimodal dataset and models for understanding plankton ecosystems

Marine plankton underpin aquatic food webs and play a key role in global CO2 sequestration, making reliable species identification critical for understanding ocean health and climate feedbacks. Existing classification models perform well on individual collections but fail to generalize across instruments and environments due to isolated training datasets and inconsistent labels. To address this, we introduce Planktonzilla-17M, a unified dataset consolidating publicly available plankton image collections spanning thirteen imaging systems. It comprises 17.4 million images with standardized taxonomy and geo-environmental metadata, including 3.74 million plankton images spanning over 602 taxonomic classes, of which 201 are identified at the species level, making it the largest and most comprehensive plankton image dataset to date. Using this large-scale dataset, we perform a controlled comparison between supervised and CLIP-style image--text training on a shared ViT backbone. We find that a supervised classifier matches or exceeds CLIP-style training when trained using taxonomic lineage as text. We further observe that BioCLIP and BioCLIP2 perform poorly on plankton in zero-shot and few-shot settings. Leveraging Planktonzilla-17M improves plankton classification performance, highlighting the limitations of current biological foundation models in marine imaging domains.

翻译：海洋浮游生物支撑着水生食物网，并在全球二氧化碳封存中发挥关键作用，因此可靠的物种识别对于理解海洋健康和气候反馈至关重要。现有分类模型在单个数据集上表现良好，但由于训练数据集孤立且标签不一致，难以跨仪器和环境泛化。为解决这一问题，我们提出Planktonzilla-17M，一个整合了涵盖13种成像系统的公开浮游生物图像集合的统一数据集。该数据集包含1740万张图像，附有标准化分类学信息和地理环境元数据，其中374万张浮游生物图像跨越超过602个分类类别，201个类别精确到物种级别，是迄今为止最大、最全面的浮游生物图像数据集。利用这一大规模数据集，我们在共享ViT骨干网络上开展了监督学习与CLIP式图像-文本训练的受控对比实验。结果表明，当以分类学谱系作为文本标签时，监督分类器达到或超越了CLIP式训练效果。我们进一步观察到，BioCLIP和BioCLIP2在零样本和少样本场景下对浮游生物表现不佳。借助Planktonzilla-17M可提升浮游生物分类性能，这凸显了当前生物基础模型在海洋成像领域的局限性。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

《农业中的人工智能：作物、水产养殖与畜牧业中深度学习技术综述》

专知会员服务

20+阅读 · 2025年7月31日

大规模多模态模型数据集、应用类别与分类学综述

专知会员服务

58+阅读 · 2024年12月25日

KG如何结合多模态？《知识图谱遇见多模态学习》综述，55页pdf

专知会员服务

64+阅读 · 2024年2月9日