Analogical reasoning lies at the core of human cognition and remains a fundamental challenge for artificial intelligence. Raven's Progressive Matrices (RPM) serve as a widely used benchmark to assess abstract reasoning by requiring the inference of underlying structural rules. While many vision-based and language-based models have achieved success on RPM tasks, it remains unclear whether their performance reflects genuine reasoning ability or reliance on statistical shortcuts. This study investigates the generalization capacity of modern AI systems under conditions of incomplete training by deliberately omitting several structural rules during training. Both sequence-to-sequence transformer models and vision-based architectures such as CoPINet and the Dual-Contrast Network are evaluated on the Impartial-RAVEN (I-RAVEN) dataset. Experiments reveal that although transformers demonstrate strong performance on familiar rules, their accuracy declines sharply when faced with novel or omitted rules. Moreover, the gap between token-level accuracy and complete answer accuracy highlights fundamental limitations in current approaches. These findings provide new insights into the reasoning mechanisms underlying deep learning models and underscore the need for architectures that move beyond pattern recognition toward robust abstract reasoning.
翻译:类比推理是人类认知的核心,也是人工智能面临的根本性挑战。瑞文推理矩阵通过要求推断潜在结构规则,成为评估抽象推理能力的常用基准。尽管许多基于视觉和语言的模型已在RPM任务上取得成功,但其性能究竟反映真实的推理能力还是对统计捷径的依赖仍不明确。本研究通过训练阶段刻意缺失若干结构规则,探究现代AI系统在不完整训练条件下的泛化能力。我们在Impartial-RAVEN数据集上评估了序列到序列的Transformer模型以及基于视觉的架构(如CoPINet和双对比网络)。实验表明,尽管Transformer在熟悉规则上表现优异,但在面对新颖或缺失规则时准确率急剧下降。此外,词元级准确率与完整答案准确率之间的差距揭示了当前方法的根本局限。这些发现为深度学习模型的推理机制提供了新见解,并强调需要构建超越模式识别、实现稳健抽象推理的架构体系。