HAISTA-NET: Human Assisted Instance Segmentation Through Attention

Instance segmentation is a form of image detection which has a range of applications, such as object refinement, medical image analysis, and image/video editing, all of which demand a high degree of accuracy. However, this precision is often beyond the reach of what even state-of-the-art, fully automated instance segmentation algorithms can deliver. The performance gap becomes particularly prohibitive for small and complex objects. Practitioners typically resort to fully manual annotation, which can be a laborious process. In order to overcome this problem, we propose a novel approach to enable more precise predictions and generate higher-quality segmentation masks for high-curvature, complex and small-scale objects. Our human-assisted segmentation model, HAISTA-NET, augments the existing Strong Mask R-CNN network to incorporate human-specified partial boundaries. We also present a dataset of hand-drawn partial object boundaries, which we refer to as human attention maps. In addition, the Partial Sketch Object Boundaries (PSOB) dataset contains hand-drawn partial object boundaries which represent curvatures of an object's ground truth mask with several pixels. Through extensive evaluation using the PSOB dataset, we show that HAISTA-NET outperforms state-of-the art methods such as Mask R-CNN, Strong Mask R-CNN, and Mask2Former, achieving respective increases of +36.7, +29.6, and +26.5 points in AP-Mask metrics for these three models. We hope that our novel approach will set a baseline for future human-aided deep learning models by combining fully automated and interactive instance segmentation architectures.

翻译：实例分割是一种图像检测形式，在目标细化、医学图像分析以及图像/视频编辑等众多领域有广泛应用，所有这些应用都要求高度精确性。然而，即使是当前最先进的全自动实例分割算法也往往难以达到这种精度水平。对于小型复杂物体，这一性能差距尤为显著。从业者通常不得不采用完全手动标注这一耗时方法。为解决这一问题，我们提出了一种新颖方法，旨在实现更精确的预测，并为高曲率、复杂和小尺度目标生成更高质量的分割掩码。我们的人类辅助分割模型HAISTA-NET增强了现有的Strong Mask R-CNN网络，使其能够融入人工指定的部分边界。我们还提供了一个名为人类注意力图的手绘部分物体边界数据集。此外，部分草图物体边界（PSOB）数据集包含手绘的部分物体边界，这些边界以数个像素的宽度反映物体真实掩码的曲率。通过使用PSOB数据集进行广泛评估，我们证明HAISTA-NET优于Mask R-CNN、Strong Mask R-CNN和Mask2Former等最先进方法，在AP-Mask指标上分别提升了+36.7、+29.6和+26.5个百分点。我们期望这一新颖方法能通过结合全自动与交互式实例分割架构，为未来人机协同的深度学习模型奠定基准。