Sim-to-real supervised domain adaptation for radioisotope identification

Machine learning has the potential to improve the speed and reliability of radioisotope identification using gamma spectroscopy. However, meticulously labeling an experimental dataset for training is often prohibitively expensive, while training models purely on synthetic data is risky due to the domain gap between simulated and experimental measurements. In this research, we demonstrate that supervised domain adaptation can substantially improve the performance of radioisotope identification models by transferring knowledge between synthetic and experimental data domains. We consider two domain adaptation scenarios: (1) a simulation-to-simulation adaptation, where we perform multi-label proportion estimation using simulated high-purity germanium detectors, and (2) a simulation-to-experimental adaptation, where we perform multi-class, single-label classification using measured spectra from handheld lanthanum bromide (LaBr) and sodium iodide (NaI) detectors. We begin by pretraining a spectral classifier on synthetic data using a custom transformer-based neural network. After subsequent fine-tuning on just 64 labeled experimental spectra, we achieve a test accuracy of 96% in the sim-to-real scenario with a LaBr detector, far surpassing a synthetic-only baseline model (75%) and a model trained from scratch (80%) on the same 64 spectra. Furthermore, we demonstrate that domain-adapted models learn more human-interpretable features than experiment-only baseline models. Overall, our results highlight the potential for supervised domain adaptation techniques to bridge the sim-to-real gap in radioisotope identification, enabling the development of accurate and explainable classifiers even in real-world scenarios where access to experimental data is limited.

翻译：机器学习技术有望提升基于伽马能谱的放射性核素识别速度与可靠性。然而，为训练模型而精细标注实验数据集通常成本高昂，而完全依赖合成数据训练模型则因仿真与实验测量间的域差异而存在风险。本研究证明，监督域自适应可通过在合成与实验数据域间迁移知识，显著提升放射性核素识别模型的性能。我们探讨两种域自适应场景：（1）仿真到仿真自适应，利用模拟高纯锗探测器进行多标签比例估计；（2）仿真到实验自适应，使用手持溴化镧（LaBr）与碘化钠（NaI）探测器实测能谱进行多类别单标签分类。我们首先基于定制Transformer架构的神经网络在合成数据上预训练能谱分类器，随后仅用64个标注实验能谱进行微调，在LaBr探测器的仿真到真实场景中实现了96%的测试准确率，远超仅使用合成数据的基线模型（75%）及在相同64个能谱上从头训练的模型（80%）。此外，我们证明经域自适应的模型比仅基于实验的基线模型学习到更具人类可解释性的特征。总体而言，本研究结果凸显了监督域自适应技术在弥合放射性核素识别中仿真与真实数据差距方面的潜力，为在实验数据受限的真实场景中开发准确且可解释的分类器提供了可行路径。