Plankton recognition provides novel possibilities to study various environmental aspects and an interesting real-world context to develop domain adaptation (DA) methods. Different imaging instruments cause domain shift between datasets hampering the development of general plankton recognition methods. A promising remedy for this is DA allowing to adapt a model trained on one instrument to other instruments. In this paper, we present a new DA dataset called DAPlankton which consists of phytoplankton images obtained with different instruments. Phytoplankton provides a challenging DA problem due to the fine-grained nature of the task and high class imbalance in real-world datasets. DAPlankton consists of two subsets. DAPlankton_LAB contains images of cultured phytoplankton providing a balanced dataset with minimal label uncertainty. DAPlankton_SEA consists of images collected from the Baltic Sea providing challenging real-world data with large intra-class variance and class imbalance. We further present a benchmark comparison of three widely used DA methods.
翻译:浮游植物识别为研究各种环境问题提供了新的可能性,同时也为开发域适应方法提供了一个有趣的实际应用场景。不同成像仪器导致的数据集之间的域偏移阻碍了通用浮游植物识别方法的发展。域适应作为一种有前景的解决方案,能够使基于某一仪器训练的模型适应其他仪器。本文提出一个名为DAPlankton的新域适应数据集,它包含由不同仪器获取的浮游植物图像。由于该任务的细粒度特性以及真实世界数据集中的高度类别不平衡,浮游植物构成了一个具有挑战性的域适应问题。DAPlankton包含两个子集:DAPlankton_LAB包含培养浮游植物的图像,提供了一个标签不确定性最小化的平衡数据集;DAPlankton_SEA包含从波罗的海采集的图像,提供了具有大类别内方差和类别不平衡的真实世界挑战性数据。我们进一步提供了三种广泛使用的域适应方法的基准对比。