We present CT-Bound, a fast boundary estimation method for noisy images using a hybrid Convolution and Transformer neural network. The proposed architecture decomposes boundary estimation into two tasks: local detection and global regularization of image boundaries. It first estimates a parametric representation of boundary structures only using the input image within a small receptive field and then refines the boundary structure in the parameter domain without accessing the input image. Because of this, a part of the network can be easily trained using naive, synthetic images and still generalized to real images, and the entire architecture is computationally efficient as the boundary refinement is non-iterative and not in the image domain. Compared with the previous highest accuracy methods, our experiment shows that CT-Bound is 100 times faster, producing comparably accurate, high-quality boundary and color maps. We also demonstrate that CT-Bound can produce boundary and color maps on real captured images without extra fine-tuning and real-time boundary map and color map videos at ten frames per second.
翻译:我们提出CT-Bound,一种利用混合卷积与Transformer神经网络对含噪图像进行快速边界估计的方法。该架构将边界估计分解为两个任务:局部检测与全局正则化。首先,仅通过输入图像在小感受野内估计边界结构的参数化表征;随后在参数域中无需访问输入图像即可优化边界结构。由于这一特性,网络部分可轻松使用简单合成图像进行训练并泛化至真实图像,同时因边界优化过程非迭代且不在图像域中进行,整体架构具有计算高效性。实验表明,与先前最高精度方法相比,CT-Bound的速度提升100倍,生成的边界与颜色映射在精度相当的前提下仍保持高质量。我们进一步证明,CT-Bound无需额外微调即可生成真实场景图像的边界与颜色映射,并能以每秒十帧的速率实时输出边界与颜色映射视频。