Insect classification is important for agricultural management and ecological research, as it directly affects crop health and production. However, this task remains challenging due to the complex characteristics of insects, class imbalance, and large-scale datasets. To address these issues, we propose BioAutoML-NAS, the first BioAutoML model using multimodal data, including images, and metadata, which applies neural architecture search (NAS) for images to automatically learn the best operations for each connection within each cell. Multiple cells are stacked to form the full network, each extracting detailed image feature representations. A multimodal fusion module combines image embeddings with metadata, allowing the model to use both visual and categorical biological information to classify insects. An alternating bi-level optimization training strategy jointly updates network weights and architecture parameters, while zero operations remove less important connections, producing sparse, efficient, and high-performing architectures. Extensive evaluation on the BIOSCAN-5M dataset demonstrates that BioAutoML-NAS achieves 96.81% accuracy, 97.46% precision, 96.81% recall, and a 97.05% F1 score, outperforming state-of-the-art transfer learning, transformer, AutoML, and NAS methods by approximately 16%, 10%, and 8% respectively. Further validation on the Insects-1M dataset obtains 93.25% accuracy, 93.71% precision, 92.74% recall, and a 93.22% F1 score. These results demonstrate that BioAutoML-NAS provides accurate, confident insect classification that supports modern sustainable farming.
翻译:昆虫分类对于农业管理和生态研究至关重要,因为它直接影响作物健康与产量。然而,由于昆虫的复杂特征、类别不平衡以及大规模数据集的存在,该任务仍具挑战性。为解决这些问题,我们提出了BioAutoML-NAS,这是首个利用多模态数据(包括图像和元数据)的BioAutoML模型,它应用面向图像的神经架构搜索(NAS)来自动学习每个单元内各连接的最佳操作。多个单元堆叠构成完整网络,每个单元提取详细的图像特征表示。一个多模态融合模块将图像嵌入与元数据相结合,使模型能够同时利用视觉信息和分类生物学信息对昆虫进行分类。一种交替的双层优化训练策略联合更新网络权重和架构参数,同时零操作移除重要性较低的连接,从而产生稀疏、高效且高性能的架构。在BIOSCAN-5M数据集上的广泛评估表明,BioAutoML-NAS实现了96.81%的准确率、97.46%的精确率、96.81%的召回率和97.05%的F1分数,分别以约16%、10%和8%的优势超越了当前最先进的迁移学习、Transformer、AutoML和NAS方法。在Insects-1M数据集上的进一步验证获得了93.25%的准确率、93.71%的精确率、92.74%的召回率和93.22%的F1分数。这些结果表明,BioAutoML-NAS能够提供准确、可靠的昆虫分类,以支持现代可持续农业。