An efficient deep learning model that can be implemented in real-time for polyp detection is crucial to reducing polyp miss-rate during screening procedures. Convolutional neural networks (CNNs) are vulnerable to small changes in the input image. A CNN-based model may miss the same polyp appearing in a series of consecutive frames and produce unsubtle detection output due to changes in camera pose, lighting condition, light reflection, etc. In this study, we attempt to tackle this problem by integrating temporal information among neighboring frames. We propose an efficient feature concatenation method for a CNN-based encoder-decoder model without adding complexity to the model. The proposed method incorporates extracted feature maps of previous frames to detect polyps in the current frame. The experimental results demonstrate that the proposed method of feature concatenation improves the overall performance of automatic polyp detection in videos. The following results are obtained on a public video dataset: sensitivity 90.94\%, precision 90.53\%, and specificity 92.46%
翻译:实现高效且可实时部署的深度学习模型对于降低筛查过程中息肉的漏检率至关重要。卷积神经网络对输入图像的微小变化较为敏感,基于CNN的模型可能因相机姿态、光照条件或光线反射等变化,在检测连续帧中同一息肉时产生不稳定的结果。本研究通过整合相邻帧的时序信息来应对该问题,提出一种适用于CNN编码器-解码器模型的高效特征拼接方法,该方法在不增加模型复杂度的情况下,融合前序帧的特征图以辅助当前帧的息肉检测。实验结果表明,所提出的特征拼接方法有效提升了视频自动息肉检测的整体性能。在公开视频数据集上取得了以下结果:灵敏度90.94%、精确率90.53%、特异度92.46%。