Designing urban spaces that provide pedestrian wind comfort and safety requires time-resolved Computational Fluid Dynamics (CFD) simulations, but their current computational cost makes extensive design exploration impractical. We introduce WinDiNet (Wind Diffusion Network), a pretrained video diffusion model that is repurposed as a fast, differentiable surrogate for this task. Starting from LTX-Video, a 2B-parameter latent video transformer, we fine-tune on 10,000 2D incompressible CFD simulations over procedurally generated building layouts. A systematic study of training regimes, conditioning mechanisms, and VAE adaptation strategies, including a physics-informed decoder loss, identifies a configuration that outperforms purpose-built neural PDE solvers. The resulting model generates full 112-frame rollouts in under a second. As the surrogate is end-to-end differentiable, it doubles as a physics simulator for gradient-based inverse optimization: given an urban footprint layout, we optimize building positions directly through backpropagation to improve wind safety as well as pedestrian wind comfort. Experiments on single- and multi-inlet layouts show that the optimizer discovers effective layouts even under challenging multi-objective configurations, with all improvements confirmed by ground-truth CFD simulations.
翻译:城市空间设计需兼顾行人风舒适性与安全性,这要求进行时间分辨的计算流体动力学(CFD)模拟,但当前计算成本过高使得大规模设计探索难以实现。本文提出WinDiNet(风扩散网络),一种预训练视频扩散模型,通过重新设计将其转化为面向该任务的快速可微替代模型。基于参数规模为2B的潜空间视频变换器LTX-Video,我们在程序化生成的建筑布局上,对10,000个二维不可压缩CFD模拟数据进行了微调。通过系统研究训练策略、条件控制机制及VAE适配策略(含物理感知解码器损失函数),我们确定了超越专用神经PDE求解器的配置方案。最终模型能在1秒内生成完整的112帧推演序列。由于该替代模型具有端到端可微特性,它可同时作为梯度驱动逆优化的物理模拟器:给定城市足迹布局,通过反向传播直接优化建筑位置,以提升风安全性与行人风舒适性。在单入口与多入口布局上的实验表明,即使在复杂多目标配置下,优化器仍能发现有效布局方案,所有改进均经真实CFD模拟验证。