Show, Don't Tell: Morphing Latent Reasoning into Image Generation

Text-to-image (T2I) generation has achieved remarkable progress, yet existing methods often lack the ability to dynamically reason and refine during generation--a hallmark of human creativity. Current reasoning-augmented paradigms most rely on explicit thought processes, where intermediate reasoning is decoded into discrete text at fixed steps with frequent image decoding and re-encoding, leading to inefficiencies, information loss, and cognitive mismatches. To bridge this gap, we introduce LatentMorph, a novel framework that seamlessly integrates implicit latent reasoning into the T2I generation process. At its core, LatentMorph introduces four lightweight components: (i) a condenser for summarizing intermediate generation states into compact visual memory, (ii) a translator for converting latent thoughts into actionable guidance, (iii) a shaper for dynamically steering next image token predictions, and (iv) an RL-trained invoker for adaptively determining when to invoke reasoning. By performing reasoning entirely in continuous latent spaces, LatentMorph avoids the bottlenecks of explicit reasoning and enables more adaptive self-refinement. Extensive experiments demonstrate that LatentMorph (I) enhances the base model Janus-Pro by $16\%$ on GenEval and $25\%$ on T2I-CompBench; (II) outperforms explicit paradigms (e.g., TwiG) by $15\%$ and $11\%$ on abstract reasoning tasks like WISE and IPV-Txt, (III) while reducing inference time by $44\%$ and token consumption by $51\%$; and (IV) exhibits $71\%$ cognitive alignment with human intuition on reasoning invocation.

翻译：文本到图像（T2I）生成已取得显著进展，然而现有方法通常缺乏在生成过程中进行动态推理与优化的能力——而这正是人类创造力的标志。当前增强推理的范式大多依赖显式的思维过程，其中间推理在固定步骤被解码为离散文本，并频繁进行图像解码与重新编码，导致效率低下、信息丢失和认知失配。为弥合这一差距，我们提出了LatentMorph，一个将隐式潜在推理无缝集成到T2I生成过程中的新颖框架。其核心在于引入了四个轻量级组件：(i) 一个用于将中间生成状态总结为紧凑视觉记忆的冷凝器，(ii) 一个用于将潜在思维转化为可操作指导的翻译器，(iii) 一个用于动态引导下一个图像令牌预测的塑形器，以及(iv) 一个通过强化学习训练、用于自适应决定何时调用推理的调用器。通过在连续的潜在空间中完全执行推理，LatentMorph避免了显式推理的瓶颈，并实现了更自适应的自我优化。大量实验表明，LatentMorph (I) 在GenEval上将基础模型Janus-Pro的性能提升了$16\%$，在T2I-CompBench上提升了$25\%$；(II) 在抽象推理任务（如WISE和IPV-Txt）上，其表现优于显式范式（例如TwiG）$15\%$和$11\%$，(III) 同时将推理时间减少了$44\%$，令牌消耗减少了$51\%$；并且(IV) 在推理调用方面，与人类直觉的认知一致性达到了$71\%$。