A large-scale deep model pre-trained on massive labeled or unlabeled data transfers well to downstream tasks. Linear evaluation freezes parameters in the pre-trained model and trains a linear classifier separately, which is efficient and attractive for transfer. However, little work has investigated the classifier in linear evaluation except for the default logistic regression. Inspired by the statistical efficiency of naive Bayes, the paper revisits the classical topic on discriminative vs. generative classifiers. Theoretically, the paper considers the surrogate loss instead of the zero-one loss in analyses and generalizes the classical results from binary cases to multiclass ones. We show that, under mild assumptions, multiclass naive Bayes requires $O(\log n)$ samples to approach its asymptotic error while the corresponding multiclass logistic regression requires $O(n)$ samples, where $n$ is the feature dimension. To establish it, we present a multiclass $\mathcal{H}$-consistency bound framework and an explicit bound for logistic loss, which are of independent interests. Simulation results on a mixture of Gaussian validate our theoretical findings. Experiments on various pre-trained deep vision models show that naive Bayes consistently converges faster as the number of data increases. Besides, naive Bayes shows promise in few-shot cases and we observe the ``two regimes'' phenomenon in pre-trained supervised models. Our code is available at https://github.com/ML-GSAI/Revisiting-Dis-vs-Gen-Classifiers.
翻译:在大量标注或无标注数据上预训练的大规模深度模型能很好地迁移至下游任务。线性评估冻结预训练模型的参数并单独训练线性分类器,这在迁移中高效且具有吸引力。然而,除默认的逻辑回归外,少有研究探讨线性评估中的分类器。受朴素贝叶斯统计效率的启发,本文重新审视了关于判别式与生成式分类器的经典主题。理论上,本文在分析中使用替代损失而非零一损失,并将经典结果从二分类推广至多分类。我们证明,在温和假设下,多分类朴素贝叶斯仅需$O(\log n)$个样本即可趋近其渐近误差,而相应的多分类逻辑回归需$O(n)$个样本,其中$n$为特征维度。为建立这一结果,我们提出了多分类$\mathcal{H}$一致性界框架及逻辑损失的显式界定,这些结果本身也具有重要意义。在高斯混合模型上的模拟结果验证了我们的理论发现。在各种预训练深度视觉模型上的实验表明,随着数据量增加,朴素贝叶斯一致地更快收敛。此外,朴素贝叶斯在小样本场景中展现出潜力,并且我们在预训练监督模型上观察到"双区制"现象。我们的代码见 https://github.com/ML-GSAI/Revisiting-Dis-vs-Gen-Classifiers。