Out-of-distribution (OOD) generalization is a critical challenge in deep learning. It is specifically important when the test samples are drawn from a different distribution than the training data. We develop a novel real-time deep learning based architecture, TransRUPNet that is based on a Transformer and residual upsampling network for colorectal polyp segmentation to improve OOD generalization. The proposed architecture, TransRUPNet, is an encoder-decoder network that consists of three encoder blocks, three decoder blocks, and some additional upsampling blocks at the end of the network. With the image size of $256\times256$, the proposed method achieves an excellent real-time operation speed of \textbf{47.07} frames per second with an average mean dice coefficient score of 0.7786 and mean Intersection over Union of 0.7210 on the out-of-distribution polyp datasets. The results on the publicly available PolypGen dataset (OOD dataset in our case) suggest that TransRUPNet can give real-time feedback while retaining high accuracy for in-distribution dataset. Furthermore, we demonstrate the generalizability of the proposed method by showing that it significantly improves performance on OOD datasets compared to the existing methods.
翻译:分布外(OOD)泛化是深度学习中的关键挑战,尤其当测试样本与训练数据来自不同分布时更为重要。我们提出了一种基于Transformer和残差上采样网络的实时深度学习架构TransRUPNet,用于结直肠息肉分割以提升OOD泛化能力。该架构为编码器-解码器网络,包含三个编码器模块、三个解码器模块以及网络末端的若干附加上采样模块。在图像尺寸为$256\times256$时,该方法实现了出色的实时运行速度(**47.07**帧/秒),并在分布外息肉数据集上取得平均骰子系数0.7786和平均交并比0.7210的优异表现。在公开PolypGen数据集(本研究的OOD数据集)上的结果表明,TransRUPNet可在保持高精度的同时提供实时反馈。此外,我们通过与现有方法的对比实验,证明了所提方法在OOD数据集上显著提升了性能的泛化能力。