The comparison between discriminative and generative classifiers has intrigued researchers since Efron's seminal analysis of logistic regression versus discriminant analysis. While early theoretical work established that generative classifiers exhibit lower sample complexity but higher asymptotic error in simple linear settings, these trade-offs remain unexplored in the transformer era. We present the first comprehensive evaluation of modern generative and discriminative architectures - Auto-regressive modeling, Masked Language Modeling, Discrete Diffusion, and Encoders for text classification. Our study reveals that the classical 'two regimes' phenomenon manifests distinctly across different architectures and training paradigms. Beyond accuracy, we analyze sample efficiency, calibration, noise robustness, and ordinality across diverse scenarios. Our findings offer practical guidance for selecting the most suitable modeling approach based on real-world constraints such as latency and data limitations.
翻译:自Efron对逻辑回归与判别分析的奠基性分析以来,判别式与生成式分类器的比较始终吸引着研究者的关注。早期理论工作已证明,在简单线性场景中,生成式分类器具有更低的样本复杂度但存在更高的渐近误差,然而这些权衡关系在Transformer时代仍未得到充分探索。我们首次对现代生成式与判别式架构——自回归建模、掩码语言建模、离散扩散模型及编码器在文本分类任务中进行了全面评估。研究发现,经典的“双机制”现象在不同架构与训练范式中呈现出显著差异。除准确率外,我们还系统分析了多样化场景下的样本效率、校准特性、噪声鲁棒性及序数性表现。本研究为基于实际约束(如延迟要求与数据限制)选择最适宜的建模方法提供了实践指导。