Insect classification is important for agricultural management and ecological research, as it directly affects crop health and production. However, this task remains challenging due to the complex characteristics of insects, class imbalance, and large-scale datasets. To address these issues, we propose BioAutoML-NAS, the first BioAutoML model using multimodal data, including images, and metadata, which applies neural architecture search (NAS) for images to automatically learn the best operations for each connection within each cell. Multiple cells are stacked to form the full network, each extracting detailed image feature representations. A multimodal fusion module combines image embeddings with metadata, allowing the model to use both visual and categorical biological information to classify insects. An alternating bi-level optimization training strategy jointly updates network weights and architecture parameters, while zero operations remove less important connections, producing sparse, efficient, and high-performing architectures. Extensive evaluation on the BIOSCAN-5M dataset demonstrates that BioAutoML-NAS achieves 96.81% accuracy, 97.46% precision, 96.81% recall, and a 97.05% F1 score, outperforming state-of-the-art transfer learning, transformer, AutoML, and NAS methods by approximately 16%, 10%, and 8% respectively. Further validation on the Insects-1M dataset obtains 93.25% accuracy, 93.71% precision, 92.74% recall, and a 93.22% F1 score. These results demonstrate that BioAutoML-NAS provides accurate, confident insect classification that supports modern sustainable farming.
翻译:昆虫分类对于农业管理和生态研究至关重要,因其直接影响作物健康与产量。然而,由于昆虫的复杂特征、类别不平衡以及大规模数据集,这项任务仍具挑战性。为解决这些问题,我们提出BioAutoML-NAS,这是首个利用多模态数据(包括图像和元数据)的BioAutoML模型,通过对图像应用神经架构搜索来自动学习每个单元内各连接的最佳操作。多个单元堆叠形成完整网络,每个单元提取精细的图像特征表示。多模态融合模块将图像嵌入与元数据结合,使模型能够同时利用视觉和分类生物学信息进行昆虫分类。一种交替双层优化训练策略联合更新网络权重和架构参数,同时零操作移除次要连接,生成稀疏、高效且高性能的架构。在BIOSCAN-5M数据集上的广泛评估表明,BioAutoML-NAS达到96.81%的准确率、97.46%的精确率、96.81%的召回率和97.05%的F1分数,分别比最先进的迁移学习、Transformer、AutoML和NAS方法高出约16%、10%和8%。在Insects-1M数据集上的进一步验证获得93.25%的准确率、93.71%的精确率、92.74%的召回率和93.22%的F1分数。这些结果表明BioAutoML-NAS提供了准确、可靠的昆虫分类,支持现代可持续农业。