Deep Generative model that uses physical quantities to generate and retrieve solar magnetic active regions

Deep generative models have shown immense potential in generating unseen data that has properties of real data. These models learn complex data-generating distributions starting from a smaller set of latent dimensions. However, generative models have encountered great skepticism in scientific domains due to the disconnection between generative latent vectors and scientifically relevant quantities. In this study, we integrate three types of machine learning models to generate solar magnetic patches in a physically interpretable manner and use those as a query to find matching patches in real observations. We use the magnetic field measurements from Space-weather HMI Active Region Patches (SHARPs) to train a Generative Adversarial Network (GAN). We connect the physical properties of GAN-generated images with their latent vectors to train Support Vector Machines (SVMs) that do mapping between physical and latent spaces. These produce directions in the GAN latent space along which known physical parameters of the SHARPs change. We train a self-supervised learner (SSL) to make queries with generated images and find matches from real data. We find that the GAN-SVM combination enables users to produce high-quality patches that change smoothly only with a prescribed physical quantity, making generative models physically interpretable. We also show that GAN outputs can be used to retrieve real data that shares the same physical properties as the generated query. This elevates Generative Artificial Intelligence (AI) from a means-to-produce artificial data to a novel tool for scientific data interrogation, supporting its applicability beyond the domain of heliophysics.

翻译：深度生成模型在生成具有真实数据特性的未见数据方面展现出巨大潜力。这些模型从较小的潜在维度集合出发，学习复杂的数据生成分布。然而，由于生成模型的潜在向量与科学相关量之间的脱节，其在科学领域遭遇了极大的质疑。在本研究中，我们整合了三种类型的机器学习模型，以物理可解释的方式生成太阳磁活动区斑块，并将其作为查询在真实观测中寻找匹配斑块。我们利用空间天气HMI活动区斑块（SHARPs）的磁场测量数据训练生成对抗网络（GAN）。通过将GAN生成图像的物理特性与其潜在向量相关联，我们训练支持向量机（SVMs）以建立物理空间与潜在空间之间的映射。这些模型在GAN潜在空间中产生沿已知SHARPs物理参数变化的方向。我们训练了一个自监督学习器（SSL），利用生成图像进行查询并从真实数据中寻找匹配项。研究发现，GAN-SVM组合使用户能够生成高质量斑块，这些斑块仅随指定物理量平滑变化，从而使生成模型具备物理可解释性。我们还证明，GAN输出可用于检索与生成查询具有相同物理特性的真实数据。这将生成式人工智能（AI）从生成人工数据的手段提升为一种科学数据查询的新工具，支持其超越太阳物理领域的适用性。