GENIUS: Generative Fluid Intelligence Evaluation Suite

Unified Multimodal Models (UMMs) have shown remarkable progress in visual generation. Yet, existing benchmarks predominantly assess $\textit{Crystallized Intelligence}$, which relies on recalling accumulated knowledge and learned schemas. This focus overlooks $\textit{Generative Fluid Intelligence (GFI)}$: the capacity to induce patterns, reason through constraints, and adapt to novel scenarios on the fly. To rigorously assess this capability, we introduce $\textbf{GENIUS}$ ($\textbf{GEN}$ Fluid $\textbf{I}$ntelligence Eval$\textbf{U}$ation $\textbf{S}$uite). We formalize $\textit{GFI}$ as a synthesis of three primitives. These include $\textit{Inducing Implicit Patterns}$ (e.g., inferring personalized visual preferences), $\textit{Executing Ad-hoc Constraints}$ (e.g., visualizing abstract metaphors), and $\textit{Adapting to Contextual Knowledge}$ (e.g., simulating counter-intuitive physics). Collectively, these primitives challenge models to solve problems grounded entirely in the immediate context. Our systematic evaluation of 12 representative models reveals significant performance deficits in these tasks. Crucially, our diagnostic analysis disentangles these failure modes. It demonstrates that deficits stem from limited context comprehension rather than insufficient intrinsic generative capability. To bridge this gap, we propose a training-free attention intervention strategy. Ultimately, $\textbf{GENIUS}$ establishes a rigorous standard for $\textit{GFI}$, guiding the field beyond knowledge utilization toward dynamic, general-purpose reasoning. Our dataset and code will be released at: $\href{https://github.com/arctanxarc/GENIUS}{https://github.com/arctanxarc/GENIUS}$.

翻译：统一多模态模型（UMMs）在视觉生成方面已展现出显著进展。然而，现有基准主要评估**晶体智能**，即依赖回忆累积知识和习得模式的能力。这种关注点忽视了**生成式流体智能（GFI）**：即实时归纳模式、通过约束进行推理以及适应新场景的能力。为严格评估这一能力，我们提出了**GENIUS**（**GEN** Fluid **I**ntelligence Eval**U**ation **S**uite）。我们将**GFI**形式化为三种基本能力的综合，包括**归纳隐含模式**（例如，推断个性化视觉偏好）、**执行即时约束**（例如，可视化抽象隐喻）以及**适应情境知识**（例如，模拟反直觉物理现象）。这些基本能力共同挑战模型完全基于即时情境解决问题的能力。我们对12个代表性模型的系统评估揭示了这些任务中的显著性能缺陷。关键的是，我们的诊断分析解构了这些失败模式，证明缺陷源于有限的情境理解能力，而非内在生成能力不足。为弥合这一差距，我们提出了一种免训练的注意力干预策略。最终，**GENIUS**为**GFI**建立了严格标准，引导该领域超越知识利用，迈向动态、通用推理。我们的数据集与代码将在以下地址发布：https://github.com/arctanxarc/GENIUS。