Adversarial training with Normalizing Flow (NF) models is an emerging research area aimed at improving model robustness through adversarial samples. In this study, we focus on applying adversarial training to NF models for gravitational wave parameter estimation. We propose an adaptive epsilon method for Fast Gradient Sign Method (FGSM) adversarial training, which dynamically adjusts perturbation strengths based on gradient magnitudes using logarithmic scaling. Our hybrid architecture, combining ResNet and Inverse Autoregressive Flow, reduces the Negative Log Likelihood (NLL) loss by 47\% under FGSM attacks compared to the baseline model, while maintaining an NLL of 4.2 on clean data (only 5\% higher than the baseline). For perturbation strengths between 0.01 and 0.1, our model achieves an average NLL of 5.8, outperforming both fixed-epsilon (NLL: 6.7) and progressive-epsilon (NLL: 7.2) methods. Under stronger Projected Gradient Descent attacks with perturbation strength of 0.05, our model maintains an NLL of 6.4, demonstrating superior robustness while avoiding catastrophic overfitting.
翻译:利用归一化流模型进行对抗训练是一个新兴的研究领域,旨在通过对抗样本来提升模型的鲁棒性。本研究重点关注将对抗训练应用于引力波参数估计的归一化流模型。我们提出了一种用于快速梯度符号法对抗训练的自适应epsilon方法,该方法利用对数缩放,根据梯度大小动态调整扰动强度。我们结合了ResNet和逆自回归流的混合架构,在FGSM攻击下,相较于基线模型,将负对数似然损失降低了47%,同时在干净数据上保持了4.2的NLL(仅比基线高5%)。在扰动强度介于0.01至0.1之间时,我们的模型实现了平均5.8的NLL,优于固定epsilon(NLL: 6.7)和渐进式epsilon(NLL: 7.2)方法。在扰动强度为0.05的更强的投影梯度下降攻击下,我们的模型仍能保持6.4的NLL,在避免灾难性过拟合的同时,展现了卓越的鲁棒性。