Out-of-distribution detection algorithms for robust insect classification

Deep learning-based approaches have produced models with good insect classification accuracy; Most of these models are conducive for application in controlled environmental conditions. One of the primary emphasis of researchers is to implement identification and classification models in the real agriculture fields, which is challenging because input images that are wildly out of the distribution (e.g., images like vehicles, animals, humans, or a blurred image of an insect or insect class that is not yet trained on) can produce an incorrect insect classification. Out-of-distribution (OOD) detection algorithms provide an exciting avenue to overcome these challenge as it ensures that a model abstains from making incorrect classification prediction of non-insect and/or untrained insect class images. We generate and evaluate the performance of state-of-the-art OOD algorithms on insect detection classifiers. These algorithms represent a diversity of methods for addressing an OOD problem. Specifically, we focus on extrusive algorithms, i.e., algorithms that wrap around a well-trained classifier without the need for additional co-training. We compared three OOD detection algorithms: (i) Maximum Softmax Probability, which uses the softmax value as a confidence score, (ii) Mahalanobis distance-based algorithm, which uses a generative classification approach; and (iii) Energy-Based algorithm that maps the input data to a scalar value, called energy. We performed an extensive series of evaluations of these OOD algorithms across three performance axes: (a) \textit{Base model accuracy}: How does the accuracy of the classifier impact OOD performance? (b) How does the \textit{level of dissimilarity to the domain} impact OOD performance? and (c) \textit{Data imbalance}: How sensitive is OOD performance to the imbalance in per-class sample size?

翻译：基于深度学习的方法已产生具有良好昆虫分类准确率的模型；然而，这些模型大多适用于受控环境条件。研究人员的重点之一是在实际农业场景中部署识别与分类模型，但这极具挑战性，因为严重偏离输入分布（例如车辆、动物、人类等图像，或未经训练的模糊昆虫图像或昆虫类别）的图像可能导致错误的昆虫分类。分布外检测算法为解决这一挑战提供了令人兴奋的路径，它能确保模型避免对非昆虫和/或未经训练的昆虫类别图像做出错误分类预测。我们生成了最先进的分布外算法，并在昆虫检测分类器上评估其性能。这些算法代表了应对分布外问题的多样化方法。具体而言，我们聚焦于外挂式算法，即无需额外协同训练即可封装训练完成的分类器的算法。我们比较了三种分布外检测算法：(i) 最大Softmax概率算法，利用softmax值作为置信度分数；(ii) 基于马氏距离的算法，采用生成式分类方法；(iii) 基于能量的算法，将输入数据映射为标量值（即能量）。我们从三个性能轴对该分布外算法进行了广泛评估：(a) 基础模型准确率：分类器的准确率如何影响分布外性能？(b) 域间相似度水平：与目标域的相异程度如何影响分布外性能？(c) 数据不平衡：分布外性能对每类样本量的不平衡敏感程度如何？