ARC tests in-context rule induction: given a few input-output demonstrations, a model must infer the hidden rule and apply it to a new query. While many approaches express ARC rules through language, code, or symbolic programs, ARC itself is visual-symbolic: rules appear as grid transitions over objects, colors, shapes, and spatial relations. We introduce Loop-OWM, an object-centric world-modeling architecture that learns these rules as composable transitions over structured states. It combines color-prototype slots, demonstration-conditioned task summaries, and a looped transition model with dense propagation and slot-conditioned correction. On both ARC-1 and ARC-2, Loop-OWM outperforms non-looped and looped baselines with comparable or fewer parameters. These results suggest that ARC rules can be learned not only as language descriptions or searched programs, but also as transitions over visual-symbolic world states.
翻译:ARC测试上下文规则归纳能力:给定少量输入-输出演示,模型必须推断隐藏规则并将其应用于新查询。虽然许多方法通过语言、代码或符号程序表达ARC规则,但ARC本身是视觉-符号性的:规则表现为对象、颜色、形状和空间关系上的网格转换。我们提出Loop-OWM,一种以对象为中心的世界建模架构,通过学习规则作为结构化状态上的可组合转换。它结合了颜色原型槽、演示条件任务摘要,以及具有密集传播和槽条件校正的循环转换模型。在ARC-1和ARC-2上,Loop-OWM以相当或更少的参数优于非循环和循环基线模型。这些结果表明,ARC规则不仅可以通过语言描述或搜索程序学习,还可以通过视觉-符号世界状态上的转换来学习。