We present a diffusion-based framework for document-centric background generation that achieves foreground preservation and multi-page stylistic consistency through latent-space design rather than explicit constraints. Instead of suppressing diffusion updates or applying masking heuristics, our approach reinterprets diffusion as the evolution of stochastic trajectories through a structured latent space. By shaping the initial noise and its geometric alignment, background generation naturally avoids designated foreground regions, allowing readable content to remain intact without auxiliary mechanisms. To address the long-standing issue of stylistic drift across pages, we decouple style control from text conditioning and introduce cached style directions as persistent vectors in latent space. Once selected, these directions constrain diffusion trajectories to a shared stylistic subspace, ensuring consistent appearance across pages and editing iterations. This formulation eliminates the need for repeated prompt-based style specification and provides a more stable foundation for multi-page generation. Our framework admits a geometric and physical interpretation, where diffusion paths evolve on a latent manifold shaped by preferred directions, and foreground regions are rarely traversed as a consequence of trajectory initialization rather than explicit exclusion. The proposed method is training-free, compatible with existing diffusion backbones, and produces visually coherent, foreground-preserving results across complex documents. By reframing diffusion as trajectory design in latent space, we offer a principled approach to consistent and structured generative modeling.
翻译:本文提出一种基于扩散模型的文档中心化背景生成框架,通过潜在空间设计而非显式约束实现前景保持与多页面风格一致性。与抑制扩散更新或应用掩码启发式方法不同,我们的方法将扩散重新诠释为随机轨迹在结构化潜在空间中的演化过程。通过塑造初始噪声及其几何对齐方式,背景生成能自然避开指定的前景区域,使得可读内容无需辅助机制即可保持完整。针对跨页面风格漂移这一长期存在的问题,我们将风格控制与文本条件解耦,并引入缓存风格方向作为潜在空间中的持久向量。这些方向一旦选定,即可将扩散轨迹约束至共享的风格子空间,确保跨页面及编辑迭代过程中的外观一致性。该形式化方法消除了基于重复提示的风格指定需求,为多页面生成提供了更稳定的基础。我们的框架允许几何与物理解释:扩散路径在由偏好方向塑造的潜在流形上演化,而前景区域因轨迹初始化而非显式排除机制而极少被穿越。所提方法无需训练,兼容现有扩散模型主干网络,能在复杂文档中生成视觉连贯且保持前景的生成结果。通过将扩散重新定义为潜在空间中的轨迹设计,我们为一致且结构化的生成建模提供了原理性方法。