HAISTA-NET: Human Assisted Instance Segmentation Through Attention

Instance segmentation is a form of image detection which has a range of applications, such as object refinement, medical image analysis, and image/video editing, all of which demand a high degree of accuracy. However, this precision is often beyond the reach of what even state-of-the-art, fully automated instance segmentation algorithms can deliver. The performance gap becomes particularly prohibitive for small and complex objects. Practitioners typically resort to fully manual annotation, which can be a laborious process. In order to overcome this problem, we propose a novel approach to enable more precise predictions and generate higher-quality segmentation masks for high-curvature, complex and small-scale objects. Our human-assisted segmentation model, HAISTA-NET, augments the existing Strong Mask R-CNN network to incorporate human-specified partial boundaries. We also present a dataset of hand-drawn partial object boundaries, which we refer to as human attention maps. In addition, the Partial Sketch Object Boundaries (PSOB) dataset contains hand-drawn partial object boundaries which represent curvatures of an object's ground truth mask with several pixels. Through extensive evaluation using the PSOB dataset, we show that HAISTA-NET outperforms state-of-the art methods such as Mask R-CNN, Strong Mask R-CNN, and Mask2Former, achieving respective increases of +36.7, +29.6, and +26.5 points in AP-Mask metrics for these three models. We hope that our novel approach will set a baseline for future human-aided deep learning models by combining fully automated and interactive instance segmentation architectures.

翻译：实例分割是一种图像检测形式，在目标细化、医学图像分析以及图像/视频编辑等领域具有广泛应用，这些应用对精度要求极高。然而，即便当前最先进的完全自动化实例分割算法也难以达到这种精度。对于小而复杂的物体，性能差距尤为显著。从业者通常诉诸完全手动标注，这是一个耗费人力的过程。为解决此问题，我们提出了一种新方法，能够实现更精确的预测，并为高曲率、复杂及小尺度目标生成更高质量的分割掩码。我们的人类辅助分割模型HAISTA-NET，在现有Strong Mask R-CNN网络的基础上进行增强，融入人类指定的部分边界。我们还提出了一种手绘部分目标边界数据集，称之为人类注意力图。此外，部分草图目标边界（PSOB）数据集包含手绘的部分目标边界，这些边界以像素级精度反映了目标真实掩码的曲率。通过使用PSOB数据集进行的广泛评估，我们展示了HAISTA-NET在AP-Mask指标上胜过Mask R-CNN、Strong Mask R-CNN和Mask2Former等现有方法，分别提升了+36.7、+29.6和+26.5个点。我们期望这种新方法能够通过结合全自动和交互式实例分割架构，为未来的人类辅助深度学习模型树立基准。