Phi-SegNet: Phase-Integrated Supervision for Medical Image Segmentation

Deep learning has substantially advanced medical image segmentation, yet achieving robust generalization across diverse imaging modalities and anatomical structures remains a major challenge. A key contributor to this limitation lies in how existing architectures, ranging from CNNs to Transformers and their hybrids, primarily encode spatial information while overlooking frequency-domain representations that capture rich structural and textural cues. Although few recent studies have begun exploring spectral information at the feature level, supervision-level integration of frequency cues-crucial for fine-grained object localization-remains largely untapped. To this end, we propose Phi-SegNet, a CNN-based architecture that incorporates phase-aware information at both architectural and optimization levels. The network integrates Bi-Feature Mask Former (BFMF) modules that blend neighboring encoder features to reduce semantic gaps, and Reverse Fourier Attention (RFA) blocks that refine decoder outputs using phase-regularized features. A dedicated phase-aware loss aligns these features with structural priors, forming a closed feedback loop that emphasizes boundary precision. Evaluated on five public datasets spanning X-ray, US, histopathology, MRI, and colonoscopy, Phi-SegNet consistently achieved state-of-the-art performance, with an average relative improvement of 1.54+/-1.26% in IoU and 0.98+/-0.71% in F1-score over the next best-performing model. In cross-dataset generalization scenarios involving unseen datasets from the known domain, Phi-SegNet also exhibits robust and superior performance, highlighting its adaptability and modality-agnostic design. These findings demonstrate the potential of leveraging spectral priors in both feature representation and supervision, paving the way for generalized segmentation frameworks that excel in fine-grained object localization.

翻译：深度学习极大地推动了医学图像分割的发展，然而，要在多样化的成像模态和解剖结构上实现鲁棒的泛化能力，仍然是一个重大挑战。造成这一局限性的一个关键因素在于，现有架构（从CNN到Transformer及其混合模型）主要编码空间信息，而忽略了捕获丰富结构和纹理线索的频域表示。尽管最近少数研究已开始在特征层面探索频谱信息，但在监督层面整合频率线索——这对于细粒度目标定位至关重要——在很大程度上仍未得到开发。为此，我们提出了Phi-SegNet，一种基于CNN的架构，在架构和优化两个层面融入了相位感知信息。该网络集成了双特征掩码前馈（BFMF）模块，该模块融合相邻编码器特征以减少语义鸿沟；以及反向傅里叶注意力（RFA）模块，该模块利用相位正则化特征来细化解码器输出。一个专门的相位感知损失函数将这些特征与结构先验对齐，形成一个强调边界精度的闭环反馈回路。在涵盖X射线、超声、组织病理学、MRI和结肠镜检查的五个公共数据集上进行评估，Phi-SegNet始终取得了最先进的性能，在IoU指标上平均相对提升了1.54+/-1.26%，在F1分数上平均相对提升了0.98+/-0.71%，优于次优模型。在涉及已知领域未见数据集的跨数据集泛化场景中，Phi-SegNet也展现出鲁棒且优越的性能，突显了其适应性和模态无关的设计。这些发现证明了在特征表示和监督中利用频谱先验的潜力，为在细粒度目标定位方面表现出色的通用分割框架铺平了道路。