Heteroscedastic classifiers, which learn a multivariate Gaussian distribution over prediction logits, have been shown to perform well on image classification problems with hundreds to thousands of classes. However, compared to standard classifiers, they introduce extra parameters that scale linearly with the number of classes. This makes them infeasible to apply to larger-scale problems. In addition heteroscedastic classifiers introduce a critical temperature hyperparameter which must be tuned. We propose HET-XL, a heteroscedastic classifier whose parameter count when compared to a standard classifier scales independently of the number of classes. In our large-scale settings, we show that we can remove the need to tune the temperature hyperparameter, by directly learning it on the training data. On large image classification datasets with up to 4B images and 30k classes our method requires 14X fewer additional parameters, does not require tuning the temperature on a held-out set and performs consistently better than the baseline heteroscedastic classifier. HET-XL improves ImageNet 0-shot classification in a multimodal contrastive learning setup which can be viewed as a 3.5 billion class classification problem.
翻译:异方差分类器通过学习预测逻辑值上的多元高斯分布,已在包含数百至数千类别的图像分类问题中表现出优异性能。然而,与标准分类器相比,它们引入的额外参数与类别数量呈线性增长,使其难以应用于更大规模问题。此外,异方差分类器引入了一个必须调优的关键温度超参数。我们提出HET-XL——一种异方差分类器,其参数数量与标准分类器相比,不随类别数量变化而扩展。在大规模设置中,我们证明可通过直接在训练数据上学习温度超参数来消除其调优需求。在包含多达40亿张图像和3万个类别的大规模图像分类数据集上,我们的方法所需额外参数减少14倍,无需在保留集上调优温度,且性能始终优于基线异方差分类器。HET-XL改进了多模态对比学习框架中的ImageNet零样本分类,该问题可视为一个35亿类别的分类任务。