RADiff: Controllable Diffusion Models for Radio Astronomical Maps Generation

Renato Sortino,Thomas Cecconello,Andrea DeMarco,Giuseppe Fiameni,Andrea Pilzer,Andrew M. Hopkins,Daniel Magro,Simone Riggi,Eva Sciacca,Adriano Ingallinera,Cristobal Bordiu,Filomena Bufano,Concetto Spampinato

Along with the nearing completion of the Square Kilometre Array (SKA), comes an increasing demand for accurate and reliable automated solutions to extract valuable information from the vast amount of data it will allow acquiring. Automated source finding is a particularly important task in this context, as it enables the detection and classification of astronomical objects. Deep-learning-based object detection and semantic segmentation models have proven to be suitable for this purpose. However, training such deep networks requires a high volume of labeled data, which is not trivial to obtain in the context of radio astronomy. Since data needs to be manually labeled by experts, this process is not scalable to large dataset sizes, limiting the possibilities of leveraging deep networks to address several tasks. In this work, we propose RADiff, a generative approach based on conditional diffusion models trained over an annotated radio dataset to generate synthetic images, containing radio sources of different morphologies, to augment existing datasets and reduce the problems caused by class imbalances. We also show that it is possible to generate fully-synthetic image-annotation pairs to automatically augment any annotated dataset. We evaluate the effectiveness of this approach by training a semantic segmentation model on a real dataset augmented in two ways: 1) using synthetic images obtained from real masks, and 2) generating images from synthetic semantic masks. We show an improvement in performance when applying augmentation, gaining up to 18% in performance when using real masks and 4% when augmenting with synthetic masks. Finally, we employ this model to generate large-scale radio maps with the objective of simulating Data Challenges.

翻译：随着平方公里阵列（SKA）即将建成，人们对准确可靠的自动化解决方案的需求日益增长，以便从它将获取的海量数据中提取有价值信息。在此背景下，自动源检测尤为重要，因为它能够实现天体的检测与分类。基于深度学习的物体检测和语义分割模型已被证明适用于此目的。然而，训练此类深度网络需要大量标注数据，这在射电天文学中并非易事。由于数据需要专家手动标注，这一过程难以扩展至大规模数据集，限制了利用深度网络解决多项任务的可能性。在本工作中，我们提出RADiff，一种基于条件扩散模型的生成方法，该模型在标注的射电数据集上训练，用于生成包含不同形态射电源的合成图像，以扩充现有数据集并缓解类别不平衡问题。我们还展示了通过生成全合成图像-标注对来自动扩充任意标注数据集的可能性。我们通过在真实数据集上训练语义分割模型来评估该方法的有效性，该数据集通过两种方式扩充：1）使用真实掩模生成的合成图像，2）从合成语义掩模生成图像。结果表明，应用数据扩充后性能有所提升：使用真实掩模时性能提升高达18%，使用合成掩模时提升4%。最后，我们利用该模型生成大规模射电图像，用于模拟数据挑战。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日