Integrating textual data with imaging in liver tumor segmentation is essential for enhancing diagnostic accuracy. However, current multi-modal medical datasets offer only general text annotations, lacking lesion-specific details critical for extracting nuanced features, especially for fine-grained segmentation of tumor boundaries and small lesions. To address these limitations, we developed datasets with lesion-specific text annotations for liver tumors and introduced the TexLiverNet model. TexLiverNet employs an agent-based cross-attention module that integrates text features efficiently with visual features, significantly reducing computational costs. Additionally, enhanced spatial and adaptive frequency domain perception is proposed to precisely delineate lesion boundaries, reduce background interference, and recover fine details in small lesions. Comprehensive evaluations on public and private datasets demonstrate that TexLiverNet achieves superior performance compared to current state-of-the-art methods.
翻译:在肝脏肿瘤分割中整合文本数据与影像数据对于提升诊断准确性至关重要。然而,当前的多模态医学数据集仅提供通用的文本标注,缺乏对提取细微特征(尤其是肿瘤边界和小病灶的细粒度分割)至关重要的病灶特异性细节。为解决这些局限性,我们开发了具有病灶特异性文本标注的肝脏肿瘤数据集,并提出了TexLiverNet模型。TexLiverNet采用基于智能体的交叉注意力模块,将文本特征与视觉特征高效融合,显著降低了计算成本。此外,模型提出了增强的空间与自适应频域感知机制,以精确勾勒病灶边界、减少背景干扰并恢复小病灶的精细细节。在公开及私有数据集上的综合评估表明,与当前最先进方法相比,TexLiverNet实现了更优的性能。