Accurate identification of fungi species presents a unique challenge in computer vision due to fine-grained inter-species variation and high intra-species variation. This paper presents our approach for the FungiCLEF 2025 competition, which focuses on few-shot fine-grained visual categorization (FGVC) using the FungiTastic Few-Shot dataset. Our team (DS@GT) experimented with multiple vision transformer models, data augmentation, weighted sampling, and incorporating textual information. We also explored generative AI models for zero-shot classification using structured prompting but found them to significantly underperform relative to vision-based models. Our final model outperformed both competition baselines and highlighted the effectiveness of domain specific pretraining and balanced sampling strategies. Our approach ranked 35/74 on the private test set in post-completion evaluation, this suggests additional work can be done on metadata selection and domain-adapted multi-modal learning. Our code is available at https://github.com/dsgt-arc/fungiclef-2025.
翻译:真菌物种的准确识别在计算机视觉领域面临独特挑战,这源于物种间细粒度差异与物种内高度变异。本文介绍了我们为FungiCLEF 2025竞赛提出的方法,该竞赛聚焦于使用FungiTastic少样本数据集进行细粒度视觉分类。我们团队(DS@GT)尝试了多种视觉Transformer模型、数据增强、加权采样及文本信息融合技术。同时探索了基于结构化提示的生成式AI模型进行零样本分类,但其性能显著低于基于视觉的模型。我们的最终模型在竞赛基准测试中表现优异,凸显了领域特定预训练与平衡采样策略的有效性。在赛后评估中,我们的方法在私有测试集上排名35/74,这表明在元数据选择和领域自适应多模态学习方面仍有改进空间。代码已开源:https://github.com/dsgt-arc/fungiclef-2025。