Vision models excel in image classification but struggle to generalize to unseen data, such as classifying images from unseen domains or discovering novel categories. In this paper, we explore the relationship between logical reasoning and deep learning generalization in visual classification. A logical regularization termed L-Reg is derived which bridges a logical analysis framework to image classification. Our work reveals that L-Reg reduces the complexity of the model in terms of the feature distribution and classifier weights. Specifically, we unveil the interpretability brought by L-Reg, as it enables the model to extract the salient features, such as faces to persons, for classification. Theoretical analysis and experiments demonstrate that L-Reg enhances generalization across various scenarios, including multi-domain generalization and generalized category discovery. In complex real-world scenarios where images span unknown classes and unseen domains, L-Reg consistently improves generalization, highlighting its practical efficacy.
翻译:视觉模型在图像分类任务中表现出色,但在面对未见数据(如来自未知领域的图像或新类别发现)时泛化能力不足。本文探究了视觉分类中逻辑推理与深度学习泛化能力之间的关联,提出了一种称为L-Reg的逻辑正则化方法,将逻辑分析框架与图像分类任务相连接。研究表明,L-Reg在特征分布和分类器权重层面降低了模型复杂度。特别地,我们揭示了L-Reg带来的可解释性优势——该方法使模型能够提取显著特征(如通过人脸识别人物)进行分类。理论分析与实验表明,L-Reg在多种场景下均能增强泛化能力,包括多领域泛化与广义类别发现任务。在图像涉及未知类别和未见领域的复杂现实场景中,L-Reg持续提升泛化性能,彰显了其实用价值。