What is the best paradigm to recognize objects -- discriminative inference (fast but potentially prone to shortcut learning) or using a generative model (slow but potentially more robust)? We build on recent advances in generative modeling that turn text-to-image models into classifiers. This allows us to study their behavior and to compare them against discriminative models and human psychophysical data. We report four intriguing emergent properties of generative classifiers: they show a record-breaking human-like shape bias (99% for Imagen), near human-level out-of-distribution accuracy, state-of-the-art alignment with human classification errors, and they understand certain perceptual illusions. Our results indicate that while the current dominant paradigm for modeling human object recognition is discriminative inference, zero-shot generative models approximate human object recognition data surprisingly well.
翻译:物体识别的最佳范式是什么——判别式推理(快速但可能易于陷入捷径学习)还是使用生成式模型(缓慢但可能更具鲁棒性)?我们基于生成式建模的最新进展,将文本到图像模型转化为分类器。这使我们能够研究其行为,并将其与判别式模型及人类心理物理数据进行比较。我们报告了生成式分类器的四个有趣涌现特性:它们表现出破纪录的类人形状偏差(Imagen模型达到99%)、接近人类水平的分布外准确性、与人类分类错误达到最先进的对齐程度,并且能理解某些感知错觉。我们的结果表明,尽管当前建模人类物体识别的主导范式是判别式推理,但零样本生成式模型却能惊人地逼近人类物体识别数据。