With the steady rise of the use of AI in bio-technical applications and the widespread adoption of genomics sequencing, an increasing amount of AI-based algorithms and tools is entering the research and production stage affecting critical decision-making streams like drug discovery and clinical outcomes. This paper demonstrates the vulnerability of AI models often utilized downstream tasks on recognized public genomics datasets. We undermine model robustness by deploying an attack that focuses on input transformation while mimicking the real data and confusing the model decision-making, ultimately yielding a pronounced deterioration in model performance. Further, we enhance our approach by generating poisoned data using a variational autoencoder-based model. Our empirical findings unequivocally demonstrate a decline in model performance, underscored by diminished accuracy and an upswing in false positives and false negatives. Furthermore, we analyze the resulting adversarial samples via spectral analysis yielding conclusions for countermeasures against such attacks.
翻译:随着人工智能在生物技术应用中的稳步发展以及基因组测序技术的广泛采用,越来越多基于AI的算法和工具进入研发与生产阶段,影响着药物发现和临床结果等关键决策流程。本文揭示了常用于公开基因组数据集下游任务的AI模型的脆弱性。我们通过部署一种聚焦于输入变换的攻击手段来削弱模型鲁棒性,该攻击在模仿真实数据的同时干扰模型决策,最终导致模型性能显著恶化。此外,我们利用基于变分自编码器的模型生成中毒数据以增强攻击方法。实证研究结果明确显示模型性能下降,表现为准确率降低以及假阳性和假阴性增加。进一步地,我们通过频谱分析对生成的对抗样本进行研究,得出了针对此类攻击的防御措施结论。