Deep learning (DL) techniques have broad applications in science, especially in seeking to streamline the pathway to potential solutions and discoveries. Frequently, however, DL models are trained on the results of simulation yet applied to real experimental data. As such, any systematic differences between the simulated and real data may degrade the model's performance -- an effect known as "domain shift." This work studies a toy model of the systematic differences between simulated and real data. It presents a fully unsupervised, task-agnostic method to reduce differences between two systematically different samples. The method is based on the recent advances in unpaired image-to-image translation techniques and is validated on two sets of samples of simulated Liquid Argon Time Projection Chamber (LArTPC) detector events, created to illustrate common systematic differences between the simulated and real data in a controlled way. LArTPC-based detectors represent the next-generation particle detectors, producing unique high-resolution particle track data. This work open-sources the generated LArTPC data set, called Simple Liquid-Argon Track Samples (or SLATS), allowing researchers from diverse domains to study the LArTPC-like data for the first time.
翻译:深度学习技术广泛应用于科学领域,尤其在简化潜在解决方案与发现的路径方面具有显著价值。然而,深度学习模型常基于模拟结果进行训练,却应用于真实实验数据。因此,模拟数据与真实数据之间的任何系统性差异都可能降低模型性能——这种现象被称为“领域偏移”。本研究构建了一个模拟数据与真实数据间系统性差异的简化模型,并提出了一种完全无监督、任务无关的方法来减小两组系统差异样本之间的差距。该方法基于近期无配对图像到图像翻译技术的进展,并在两组模拟液态氩时间投影室(LArTPC)探测器事件样本上进行了验证。这些样本通过可控方式展现了模拟数据与真实数据之间常见的系统性差异。基于LArTPC的探测器代表了下一代粒子探测器技术,可生成独特的高分辨率粒子径迹数据。本研究开源了所生成的LArTPC数据集——称为简单液态氩径迹样本(SLATS),使不同领域的研究人员能够首次研究类似LArTPC的数据。