Early detection of colorectal cancer hinges on real-time, accurate polyp identification and resection. Yet current high-precision segmentation models rely on GPUs, making them impractical to deploy in primary hospitals, mobile endoscopy units, or capsule robots. To bridge this gap, we present the UltraSeg family, operating in an extreme-compression regime (<0.3 M parameters). UltraSeg-108K (0.108 M parameters) is optimized for single-center data, while UltraSeg-130K (0.13 M parameters) generalizes to multi-center, multi-modal images. By jointly optimizing encoder-decoder widths, incorporating constrained dilated convolutions to enlarge receptive fields, and integrating a cross-layer lightweight fusion module, the models achieve 90 FPS on a single CPU core without sacrificing accuracy. Evaluated on seven public datasets, UltraSeg retains >94% of the Dice score of a 31 M-parameter U-Net while utilizing only 0.4% of its parameters, establishing a strong, clinically viable baseline for the extreme-compression domain and offering an immediately deployable solution for resource-constrained settings. This work provides not only a CPU-native solution for colonoscopy but also a reproducible blueprint for broader minimally invasive surgical vision applications. Source code is publicly available to ensure reproducibility and facilitate future benchmarking.
翻译:结直肠癌的早期发现依赖于实时、准确的息肉识别与切除。然而,当前的高精度分割模型依赖GPU,使其难以在基层医院、移动内窥镜单元或胶囊机器人中部署。为弥补这一差距,我们提出了在极端压缩机制下(<0.3 M参数)运行的UltraSeg系列模型。UltraSeg-108K(0.108 M参数)针对单中心数据进行了优化,而UltraSeg-130K(0.13 M参数)则能泛化至多中心、多模态图像。通过联合优化编码器-解码器宽度、引入约束扩张卷积以扩大感受野,并集成跨层轻量级融合模块,该模型在单个CPU核心上实现了90 FPS的推理速度,且未牺牲精度。在七个公共数据集上的评估表明,UltraSeg在仅使用31 M参数U-Net的0.4%参数量的情况下,保持了其Dice分数94%以上的性能,为极端压缩领域建立了一个强大且临床可行的基准,并为资源受限环境提供了一个可立即部署的解决方案。这项工作不仅为结肠镜检查提供了一种CPU原生解决方案,也为更广泛的微创手术视觉应用提供了一个可复现的蓝图。源代码已公开,以确保可复现性并促进未来的基准测试。