Adversarial training with Normalizing Flow (NF) models is an emerging research area aimed at improving model robustness through adversarial samples. In this study, we focus on applying adversarial training to NF models for gravitational wave parameter estimation. We propose an adaptive epsilon method for Fast Gradient Sign Method (FGSM) adversarial training, which dynamically adjusts perturbation strengths based on gradient magnitudes using logarithmic scaling. Our hybrid architecture, combining ResNet and Inverse Autoregressive Flow, reduces the Negative Log Likelihood (NLL) loss by 47\% under FGSM attacks compared to the baseline model, while maintaining an NLL of 4.2 on clean data (only 5\% higher than the baseline). For perturbation strengths between 0.01 and 0.1, our model achieves an average NLL of 5.8, outperforming both fixed-epsilon (NLL: 6.7) and progressive-epsilon (NLL: 7.2) methods. Under stronger Projected Gradient Descent attacks with perturbation strength of 0.05, our model maintains an NLL of 6.4, demonstrating superior robustness while avoiding catastrophic overfitting.
翻译:使用归一化流模型进行对抗训练是一个新兴的研究领域,旨在通过对抗样本来提升模型的鲁棒性。在本研究中,我们专注于将对抗训练应用于引力波参数估计的归一化流模型。我们提出了一种用于快速梯度符号法对抗训练的自适应epsilon方法,该方法利用对数缩放,根据梯度幅度动态调整扰动强度。我们结合了ResNet和逆自回归流的混合架构,在FGSM攻击下,与基线模型相比,将负对数似然损失降低了47%,同时在干净数据上保持了4.2的NLL(仅比基线高5%)。在扰动强度介于0.01到0.1之间时,我们的模型实现了平均5.8的NLL,优于固定epsilon方法和渐进式epsilon方法。在扰动强度为0.05的更强投影梯度下降攻击下,我们的模型保持了6.4的NLL,展现了卓越的鲁棒性,同时避免了灾难性的过拟合。