Despite deep learning's broad success, its abstract-reasoning bottleneck persists. We tackle Raven's Progressive Matrices (RPM), the benchmark for pattern, reasoning and problem-solving intelligence. We model the full causal chain image $\rightarrow$ attributes $\rightarrow$ progressive patterns $\rightarrow$ consistency $\rightarrow$ answer and build the baseline DIO. Yet DIO's mutual-information lower-bound objective does not embed human logic: the bound is loose and statistic-based, ignoring causal subject-object links. We therefore present three refinements. 1) Brando introduces trainable negative options to tighten the variational bound. 2) WORLD replaces generation with a Gaussian-mixture feature model that supplies infinite, weighted negatives, further tightening the bound. 3) DIEGO adds metadata supervision to rectify the "attributes $\rightarrow$ patterns" semantic gap, aligning representations with human rules. These upgrades substantially boost discriminative RPM accuracy and, for the first time, let DIO generate valid answers in open-ended RPM. The work provides causal-driven design guidelines, objective-refinement strategies and cross-modal insights for abstract-reasoning research.
翻译:尽管深度学习取得了广泛成功,但其抽象推理瓶颈依然存在。本文以瑞文渐进矩阵(RPM)——模式识别、推理与问题解决智能的基准测试——为研究对象。我们建模了完整的因果链“图像 → 属性 → 渐进模式 → 一致性 → 答案”,并构建了基线模型DIO。然而DIO的互信息下界目标并未嵌入人类逻辑:该下界约束宽松且基于统计特性,忽略了因果主客体关联。为此我们提出三项改进:1)Brando引入可训练的负选项以收紧变分下界;2)WORLD用高斯混合特征模型替代生成过程,提供无限加权的负样本,进一步收紧下界;3)DIEGO添加元数据监督以修正“属性 → 模式”的语义鸿沟,使表征与人类规则对齐。这些升级显著提升了判别式RPM的准确率,并首次使DIO能在开放式RPM中生成有效答案。本研究为抽象推理领域提供了因果驱动的设计准则、目标函数优化策略及跨模态研究视角。