Flow matching has recently emerged as a principled framework for learning continuous-time transport maps, enabling efficient deterministic generation without relying on stochastic diffusion processes. While generative modeling has shown promise for medical image segmentation, particularly in capturing uncertainty and complex anatomical variability, existing approaches are predominantly built upon diffusion models, which incur substantial computational overhead due to iterative sampling and are often constrained by UNet-based parameterizations. In this work, we introduce MedFlowSeg, a conditional flow matching framework that formulates medical image segmentation as learning a time-dependent vector field that transports a simple prior distribution to the target segmentation distribution. This formulation enables one-step deterministic inference while preserving the expressiveness of generative modeling. We further develop a dual-conditioning mechanism to incorporate structured priors into the learned flow. Specifically, we propose a Dual-Branch Spatial Attention module that injects multi-scale structural information into the flow field, and a Frequency-Aware Attention module that models cross-domain interactions between spatial and spectral representations via discrepancy-aware fusion and time-dependent modulation. Together, these components provide an effective parameterization of conditional flows that capture both global anatomical structure and fine-grained boundary details. We provide extensive empirical validation across multiple medical imaging modalities, demonstrating that MedFlowSeg achieves state-of-the-art performance while significantly reducing computational cost compared to diffusion-based methods. Our results highlight the potential of flow matching as a theoretically grounded and computationally efficient alternative for generative medical image segmentation.
翻译:流匹配作为近年来兴起的理论框架,通过学习连续时间传输映射,实现了不依赖随机扩散过程的高效确定性生成。尽管生成式建模在医学图像分割领域展现出潜力(特别是在捕捉不确定性与复杂解剖变异方面),但现有方法主要基于扩散模型,其迭代采样过程导致显著计算负担,且往往受限于UNet结构参数化。本文提出条件流匹配框架MedFlowSeg,将医学图像分割形式化为学习一个时间相关向量场,该场将简单先验分布传输至目标分割分布。该公式支持单步确定性推理,同时保持生成建模的表现力。我们进一步开发了双重条件机制,将结构化先验融入学习到的流中:具体通过双分支空间注意力模块将多尺度结构信息注入流场,并提出频率感知注意力模块,采用差异感知融合与时变调制机制建模空间与频谱表征间的跨域交互。这些组件共同构成条件流的有效参数化,既能捕获全局解剖结构,又可保留精细边界细节。我们在多种医学图像模态上开展广泛实证验证,结果表明MedFlowSeg在显著降低计算开销的同时,实现了相较于扩散方法更优的业界领先性能。本研究凸显了流匹配作为理论基础坚实且计算高效的替代方案,在生成式医学图像分割中的潜力。