Designing urban spaces that provide pedestrian wind comfort and safety requires time-resolved Computational Fluid Dynamics (CFD) simulations, but their current computational cost makes extensive design exploration impractical. We introduce WinDiNet (Wind Diffusion Network), a pretrained video diffusion model that is repurposed as a fast, differentiable surrogate for this task. Starting from LTX-Video, a 2B-parameter latent video transformer, we fine-tune on 10,000 2D incompressible CFD simulations over procedurally generated building layouts. A systematic study of training regimes, conditioning mechanisms, and VAE adaptation strategies, including a physics-informed decoder loss, identifies a configuration that outperforms purpose-built neural PDE solvers. The resulting model generates full 112-frame rollouts in under a second. As the surrogate is end-to-end differentiable, it doubles as a physics simulator for gradient-based inverse optimization: given an urban footprint layout, we optimize building positions directly through backpropagation to improve wind safety as well as pedestrian wind comfort. Experiments on single- and multi-inlet layouts show that the optimizer discovers effective layouts even under challenging multi-objective configurations, with all improvements confirmed by ground-truth CFD simulations.
翻译:设计提升行人风舒适性与安全性的城市空间,需要高时间分辨率的计算流体动力学(CFD)模拟,但目前其计算成本过高,难以进行广泛的设计探索。我们提出WinDiNet(风扩散网络),这是一种将预训练视频扩散模型改造为快速、可微替代模型的方案。该方法基于2B参数潜视频变换器LTX-Video,在程序化生成建筑布局的10,000个二维不可压缩CFD模拟数据上进行微调。通过对训练机制、条件机制及VAE适配策略(包括物理信息解码器损失函数)的系统研究,我们确定了超越专用神经PDE求解器的配置方案。最终模型可在1秒内生成完整的112帧推演结果。由于该替代模型具有端到端可微性,它同时可作为基于梯度的逆向优化物理模拟器:给定城市足迹布局,通过反向传播直接优化建筑位置,以提升风安全性与行人风舒适性。在单入口与多入口布局上的实验表明,该优化器即使在挑战性的多目标配置下也能发现有效布局,所有改进均经基准CFD模拟验证。