Physics Informed Generative AI Enabling Labour Free Segmentation For Microscopy Analysis

Semantic segmentation of microscopy images is a critical task for high-throughput materials characterisation, yet its automation is severely constrained by the prohibitive cost, subjectivity, and scarcity of expert-annotated data. While physics-based simulations offer a scalable alternative to manual labelling, models trained on such data historically fail to generalise due to a significant domain gap, lacking the complex textures, noise patterns, and imaging artefacts inherent to experimental data. This paper introduces a novel framework for labour-free segmentation that successfully bridges this simulation-to-reality gap. Our pipeline leverages phase-field simulations to generate an abundant source of microstructural morphologies with perfect, intrinsically-derived ground-truth masks. We then employ a Cycle-Consistent Generative Adversarial Network (CycleGAN) for unpaired image-to-image translation, transforming the clean simulations into a large-scale dataset of high-fidelity, realistic SEM images. A U-Net model, trained exclusively on this synthetic data, demonstrated remarkable generalisation when deployed on unseen experimental images, achieving a mean Boundary F1-Score of 0.90 and an Intersection over Union (IOU) of 0.88. Comprehensive validation using t-SNE feature-space projection and Shannon entropy analysis confirms that our synthetic images are statistically and featurally indistinguishable from the real data manifold. By completely decoupling model training from manual annotation, our generative framework transforms a data-scarce problem into one of data abundance, providing a robust and fully automated solution to accelerate materials discovery and analysis.

翻译：显微镜图像的语义分割是用于高通量材料表征的关键任务，但其自动化受到专家标注数据成本高昂、主观性强且稀缺的严重制约。虽然基于物理的仿真为手动标注提供了可扩展的替代方案，但在此类数据上训练的模型由于存在显著的领域差距，历来难以泛化，缺乏实验数据固有的复杂纹理、噪声模式和成像伪影。本文提出了一种新颖的无人工分割框架，成功弥合了仿真与真实数据之间的差距。我们的流程利用相场仿真生成大量具有完美、内禀衍生真实掩码的微观结构形态。随后，我们采用循环一致生成对抗网络（CycleGAN）进行非配对图像到图像转换，将洁净的仿真数据转化为大规模、高保真、真实的扫描电子显微镜图像数据集。一个仅在此合成数据上训练的U-Net模型，在部署于未见过的实验图像时展现出卓越的泛化能力，实现了0.90的平均边界F1分数和0.88的交并比（IOU）。通过使用t-SNE特征空间投影和香农熵分析进行的全面验证证实，我们的合成图像在统计和特征上与真实数据流形无法区分。通过将模型训练与手动标注完全解耦，我们的生成式框架将数据稀缺问题转化为数据丰富问题，为加速材料发现和分析提供了一个稳健且全自动化的解决方案。