Image outlier detection (OD) is crucial for ensuring the quality and accuracy of image datasets used in computer vision tasks. The majority of OD algorithms, however, have not been targeted toward image data. Consequently, the results of applying such algorithms to images are often suboptimal. In this work, we propose RANSAC-NN, a novel unsupervised OD algorithm specifically designed for images. By comparing images in a RANSAC-based approach, our algorithm automatically predicts the outlier score of each image without additional training or label information. We evaluate RANSAC-NN against state-of-the-art OD algorithms on 15 diverse datasets. Without any hyperparameter tuning, RANSAC-NN consistently performs favorably in contrast to other algorithms in almost every dataset category. Furthermore, we provide a detailed analysis to understand each RANSAC-NN component, and we demonstrate its potential applications in image mislabeled detection. Code for RANSAC-NN is provided at https://github.com/mxtsai/ransac-nn
翻译:图像离群点检测(Outlier Detection, OD)对于确保计算机视觉任务中图像数据集的质量和准确性至关重要。然而,大多数离群点检测算法并非专门针对图像数据设计,因此将这些算法应用于图像时往往效果欠佳。本文提出RANSAC-NN,一种专为图像设计的新型无监督离群点检测算法。该算法通过基于RANSAC的方式比较图像,无需额外训练或标签信息即可自动预测每张图像的离群点分数。我们在15个不同数据集上将RANSAC-NN与当前最先进的离群点检测算法进行了评估。无需任何超参数调优,RANSAC-NN在几乎所有数据集类别中均持续展现出优于其他算法的性能。此外,我们详细分析了RANSAC-NN的各个组成部分,并展示了其在图像误标注检测中的潜在应用。RANSAC-NN的代码已开源至https://github.com/mxtsai/ransac-nn。