SkinGenBench: Generative Model and Preprocessing Effects for Synthetic Dermoscopic Augmentation in Melanoma Diagnosis

This work introduces SkinGenBench, a systematic biomedical imaging benchmark that investigates how preprocessing complexity interacts with generative model choice for synthetic dermoscopic image augmentation and downstream melanoma diagnosis. Using a curated dataset of $14,116$ dermoscopic images from HAM10000 and MILK10K across five lesion classes, we evaluate the two representative generative paradigms: StyleGAN2-ADA and Denoising Diffusion Probabilistic Models (DDPMs) under basic geometric augmentation and advanced artifact removal pipelines. Synthetic melanoma images are assessed using established perceptual and distributional metrics (FID, KID, IS), feature space analysis, and their impact on diagnostic performance across five downstream classifiers. Experimental results demonstrate that generative architecture choice has a stronger influence on both image fidelity and diagnostic utility than preprocessing complexity. StyleGAN2-ADA consistently produced synthetic images more closely aligned with real data distributions, achieving the lowest FID ($\approx 65.5$) and KID ($\approx 0.05$), while diffusion models generated higher variance samples at the cost of reduced perceptual fidelity and class anchoring. Advanced artifact removal yielded only marginal improvements in generative metrics and provided limited downstream diagnostic gains, suggesting possible suppression of clinically relevant texture cues. In contrast, synthetic data augmentation substantially improved melanoma detection with $8$-$15$\% absolute gains in melanoma F1-score, and ViT-B/16 achieving F1 $\approx 0.88$ and ROC-AUC $\approx 0.98$, representing an improvement of approximately $14\%$ over non-augmented baselines. Our code can be found at https://github.com/adarsh-crafts/SkinGenBench

翻译：本文提出SkinGenBench，一个系统性生物医学成像基准，用于研究预处理复杂性如何与生成模型选择交互，以影响合成皮肤镜图像增强及下游黑色素瘤诊断。利用来自HAM10000和MILK10K数据集、涵盖五类病变的14,116张皮肤镜图像，我们评估了两种代表性生成范式：在基本几何增强和高级伪影去除流程下的StyleGAN2-ADA与去噪扩散概率模型（DDPMs）。合成黑色素瘤图像通过已建立的感知和分布度量（FID、KID、IS）、特征空间分析及其对五个下游分类器诊断性能的影响进行评估。实验结果表明，生成架构选择对图像保真度和诊断效用的影响均强于预处理复杂性。StyleGAN2-ADA持续生成与真实数据分布更一致的合成图像，实现了最低的FID（约65.5）和KID（约0.05），而扩散模型以降低感知保真度和类别锚定为代价，生成了更高方差的样本。高级伪影去除仅在生成度量中带来边际改善，并提供了有限的下游诊断增益，提示可能抑制了临床相关纹理线索。相比之下，合成数据增强显著提升了黑色素瘤检测，其F1分数绝对值提高8%-15%，且ViT-B/16实现了F1约0.88和ROC-AUC约0.98，相较于未增强基线提升了约14%。我们的代码可在https://github.com/adarsh-crafts/SkinGenBench 获取。