Interactive segmentation has recently been explored to effectively and efficiently harvest high-quality segmentation masks by iteratively incorporating user hints. While iterative in nature, most existing interactive segmentation methods tend to ignore the dynamics of successive interactions and take each interaction independently. We here propose to model iterative interactive image segmentation with a Markov decision process (MDP) and solve it with reinforcement learning (RL) where each voxel is treated as an agent. Considering the large exploration space for voxel-wise prediction and the dependence among neighboring voxels for the segmentation tasks, multi-agent reinforcement learning is adopted, where the voxel-level policy is shared among agents. Considering that boundary voxels are more important for segmentation, we further introduce a boundary-aware reward, which consists of a global reward in the form of relative cross-entropy gain, to update the policy in a constrained direction, and a boundary reward in the form of relative weight, to emphasize the correctness of boundary predictions. To combine the advantages of different types of interactions, i.e., simple and efficient for point-clicking, and stable and robust for scribbles, we propose a supervoxel-clicking based interaction design. Experimental results on four benchmark datasets have shown that the proposed method significantly outperforms the state-of-the-arts, with the advantage of fewer interactions, higher accuracy, and enhanced robustness.
翻译:交互式分割近年来被探索用于通过迭代整合用户提示来高效获取高质量分割掩膜。尽管具有迭代特性,现有大多数交互式分割方法往往忽略连续交互的动态性,将每次交互视为独立事件。本文提出将迭代交互式图像分割建模为马尔可夫决策过程(MDP),并采用强化学习(RL)求解,其中每个体素被视为一个智能体。考虑到体素级预测的巨大探索空间以及分割任务中相邻体素间的依赖性,我们采用多智能体强化学习,其中体素级策略在智能体间共享。鉴于边界体素对分割更重要,我们进一步引入面向边界的奖励函数:包括以相对交叉熵增益形式呈现的全局奖励,用于在约束方向上更新策略;以及以相对权重形式呈现的边界奖励,用于强调边界预测的正确性。为结合不同交互类型的优势(点点击的简单高效性与涂鸦的稳定鲁棒性),我们提出基于超体素点击的交互设计方案。在四个基准数据集上的实验结果表明,所提方法以更少的交互次数、更高的精度和更强的鲁棒性显著优于现有最先进方法。