Arctique: An artificial histopathological dataset unifying realism and controllability for uncertainty quantification

Uncertainty Quantification (UQ) is crucial for reliable image segmentation. Yet, while the field sees continual development of novel methods, a lack of agreed-upon benchmarks limits their systematic comparison and evaluation: Current UQ methods are typically tested either on overly simplistic toy datasets or on complex real-world datasets that do not allow to discern true uncertainty. To unify both controllability and complexity, we introduce Arctique, a procedurally generated dataset modeled after histopathological colon images. We chose histopathological images for two reasons: 1) their complexity in terms of intricate object structures and highly variable appearance, which yields challenging segmentation problems, and 2) their broad prevalence for medical diagnosis and respective relevance of high-quality UQ. To generate Arctique, we established a Blender-based framework for 3D scene creation with intrinsic noise manipulation. Arctique contains 50,000 rendered images with precise masks as well as noisy label simulations. We show that by independently controlling the uncertainty in both images and labels, we can effectively study the performance of several commonly used UQ methods. Hence, Arctique serves as a critical resource for benchmarking and advancing UQ techniques and other methodologies in complex, multi-object environments, bridging the gap between realism and controllability. All code is publicly available, allowing re-creation and controlled manipulations of our shipped images as well as creation and rendering of new scenes.

翻译：不确定性量化（UQ）对于可靠的图像分割至关重要。然而，尽管该领域不断涌现新的方法，但缺乏公认的基准限制了这些方法的系统比较与评估：当前的不确定性量化方法通常在过于简化的玩具数据集或复杂的真实世界数据集上进行测试，后者无法准确反映真实的不确定性。为了统一可控性与复杂性，我们引入了Arctique，这是一个基于组织病理学结肠图像建模的程序生成数据集。我们选择组织病理学图像的原因有二：1）其复杂的物体结构和高度可变的外观带来了具有挑战性的分割问题；2）其在医学诊断中的广泛应用以及对高质量不确定性量化的相关需求。为生成Arctique，我们建立了一个基于Blender的三维场景创建框架，具备内在的噪声操控能力。Arctique包含50,000张渲染图像，附带精确掩码及噪声标签模拟。我们证明，通过独立控制图像和标签中的不确定性，可以有效研究多种常用不确定性量化方法的性能。因此，Arctique作为在复杂多对象环境中基准测试和推进不确定性量化技术及其他方法的关键资源，弥合了真实性与可控性之间的鸿沟。所有代码均已公开，允许重新生成和受控操作我们提供的图像，以及创建和渲染新场景。