Approaches of large-scale images recognition with more than 50,000 categoris

Though current CV models have been able to achieve high levels of accuracy on small-scale images classification dataset with hundreds or thousands of categories, many models become infeasible in computational or space consumption when it comes to large-scale dataset with more than 50,000 categories. In this paper, we provide a viable solution for classifying large-scale species datasets using traditional CV techniques such as.features extraction and processing, BOVW(Bag of Visual Words) and some statistical learning technics like Mini-Batch K-Means,SVM which are used in our works. And then mixed with a neural network model. When applying these techniques, we have done some optimization in time and memory consumption, so that it can be feasible for large-scale dataset. And we also use some technics to reduce the impact of mislabeling data. We use a dataset with more than 50, 000 categories, and all operations are done on common computer with l 6GB RAM and a CPU of 3. OGHz. Our contributions are: 1) analysis what problems may meet in the training processes, and presents several feasible ways to solve these problems. 2) Make traditional CV models combined with neural network models provide some feasible scenarios for training large-scale classified datasets within the constraints of time and spatial resources.

翻译：尽管当前计算机视觉模型在数百或数千个类别的小规模图像分类数据集上已能实现较高的准确率，但当面对超过50,000个类别的大规模数据集时，许多模型在计算或空间消耗上变得不可行。本文提出了一种利用传统计算机视觉技术（如特征提取与处理、BOVW（视觉词袋））以及统计学习方法（如我们工作中使用的Mini-Batch K-Means、SVM）对大规模物种数据集进行分类的可行方案，并结合神经网络模型进行混合处理。在应用这些技术时，我们对时间和内存消耗进行了优化，使其能够适用于大规模数据集。同时，我们还采用了一些技术来减少错误标注数据的影响。我们在一个包含超过50,000个类别的数据集上进行了实验，所有操作均在配备16GB内存和3.0GHz CPU的普通计算机上完成。我们的贡献包括：1）分析了训练过程中可能遇到的问题，并提出了几种可行的解决方案；2）将传统计算机视觉模型与神经网络模型相结合，为在时间和空间资源受限的条件下训练大规模分类数据集提供了可行的应用场景。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

最新《Transformers模型》教程，64页ppt

专知会员服务

326+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日