M3Dsynth: A dataset of medical 3D images with AI-generated local manipulations

The ability to detect manipulated visual content is becoming increasingly important in many application fields, given the rapid advances in image synthesis methods. Of particular concern is the possibility of modifying the content of medical images, altering the resulting diagnoses. Despite its relevance, this issue has received limited attention from the research community. One reason is the lack of large and curated datasets to use for development and benchmarking purposes. Here, we investigate this issue and propose M3Dsynth, a large dataset of manipulated Computed Tomography (CT) lung images. We create manipulated images by injecting or removing lung cancer nodules in real CT scans, using three different methods based on Generative Adversarial Networks (GAN) or Diffusion Models (DM), for a total of 8,577 manipulated samples. Experiments show that these images easily fool automated diagnostic tools. We also tested several state-of-the-art forensic detectors and demonstrated that, once trained on the proposed dataset, they are able to accurately detect and localize manipulated synthetic content, even when training and test sets are not aligned, showing good generalization ability. Dataset and code are publicly available at https://grip-unina.github.io/M3Dsynth/.

翻译：检测视觉内容篡改的能力在许多应用领域正变得日益重要，这得益于图像合成方法的快速发展。尤其值得关注的是，修改医学图像内容可能改变最终诊断结果的可能性。尽管这一问题具有相关性，但尚未引起研究界的足够重视，其部分原因在于缺乏可用于开发和基准测试的大型精选数据集。本文针对该问题展开研究，提出M3Dsynth——一个由计算机断层扫描（CT）肺部图像构成的大型篡改数据集。我们通过在真实CT扫描中注入或移除肺癌结节来生成篡改图像，采用基于生成对抗网络（GAN）或扩散模型（DM）的三种不同方法，共生成8,577个篡改样本。实验表明，这些图像能够轻易欺骗自动化诊断工具。我们还测试了多种最先进的取证检测器，证明一旦在提出的数据集上进行训练，它们能够准确检测并定位篡改的合成内容，即使在训练集与测试集不匹配的情况下也表现出良好的泛化能力。数据集与代码已在https://grip-unina.github.io/M3Dsynth/ 公开提供。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日