Humans excel at discovering regular structures from limited samples and applying inferred rules to novel settings. We investigate whether modern generative models can similarly learn underlying rules from finite samples and perform reasoning through conditional sampling. Inspired by Raven's Progressive Matrices task, we designed GenRAVEN dataset, where each sample consists of three rows, and one of 40 relational rules governing the object position, number, or attributes applies to all rows. We trained generative models to learn the data distribution, where samples are encoded as integer arrays to focus on rule learning. We compared two generative model families: diffusion (EDM, DiT, SiT) and autoregressive models (GPT2, Mamba). We evaluated their ability to generate structurally consistent samples and perform panel completion via unconditional and conditional sampling. We found diffusion models excel at unconditional generation, producing more novel and consistent samples from scratch and memorizing less, but performing less well in panel completion, even with advanced conditional sampling methods. Conversely, autoregressive models excel at completing missing panels in a rule-consistent manner but generate less consistent samples unconditionally. We observe diverse data scaling behaviors: for both model families, rule learning emerges at a certain dataset size - around 1000s examples per rule. With more training data, diffusion models improve both their unconditional and conditional generation capabilities. However, for autoregressive models, while panel completion improves with more training data, unconditional generation consistency declines. Our findings highlight complementary capabilities and limitations of diffusion and autoregressive models in rule learning and reasoning tasks, suggesting avenues for further research into their mechanisms and potential for human-like reasoning.
翻译:人类擅长从有限样本中发现规律性结构,并将推断出的规则应用于新情境。本研究探讨现代生成模型是否同样能从有限样本中学习底层规则,并通过条件采样进行推理。受瑞文渐进矩阵任务启发,我们设计了GenRAVEN数据集,其中每个样本包含三行,且所有行均受40种关系规则(控制物体位置、数量或属性)中的某一条支配。我们训练生成模型学习数据分布,样本被编码为整数数组以聚焦规则学习。比较了两类生成模型家族:扩散模型(EDM、DiT、SiT)和自回归模型(GPT2、Mamba)。评估了它们生成结构一致性样本的能力,以及通过无条件采样和条件采样完成矩阵面板的能力。研究发现:扩散模型在无条件生成方面表现卓越,能从头生成更具新颖性和一致性的样本且记忆效应更弱,但在面板补全任务中表现较差,即使采用先进的条件采样方法亦如此;相反,自回归模型能以规则一致的方式有效补全缺失面板,但无条件生成的样本一致性较低。我们观察到多样化的数据扩展行为:两类模型家族均在特定数据集规模(约每规则1000个样本)开始显现规则学习能力。随着训练数据增加,扩散模型的无条件生成和条件生成能力同步提升;然而对于自回归模型,面板补全能力虽随数据量增加而改善,但无条件生成的一致性反而下降。本研究揭示了扩散模型与自回归模型在规则学习与推理任务中互补的能力与局限,为深入探索其内在机制及实现类人推理的潜力指明了研究方向。