Adversarial robustness evaluation underpins every claim of trustworthy ML deployment, yet the field suffers from fragmented protocols and undetected gradient masking. We make two contributions. (1) Structured synthesis. We analyze nine peer-reviewed corpus sources (2020--2026) through seven complementary protocols, producing the first end-to-end structured analysis of the field's consensus and unresolved challenges. (2) Auto-ART framework. We introduce Auto-ART, an open-source framework that operationalizes identified gaps: 50+ attacks, 28 defense modules, the Robustness Diagnostic Index (RDI), and gradient-masking detection. It supports multi-norm evaluation (l1/l2/linf/semantic/spatial) and compliance mapping to NIST AI RMF, OWASP LLM Top 10, and the EU AI Act. Empirical validation on RobustBench demonstrates that Auto-ART's pre-screening identifies gradient masking in 92% of flagged cases, and RDI rankings correlate highly with full AutoAttack. Multi-norm evaluation exposes a 23.5 pp gap between average and worst-case robustness on state-of-the-art models. No prior work combines such structured meta-scientific analysis with an executable evaluation framework bridging literature gaps into engineering.
翻译:对抗鲁棒性评估是可信机器学习部署的基础,但该领域面临评估协议碎片化和梯度掩蔽未检测等问题。我们做出两项贡献:(1)结构化综合。我们通过七种互补协议分析了九个同行评审来源(2020-2026年),首次对该领域的共识与未解决挑战进行了端到端结构化分析。(2)Auto-ART框架。我们提出开源框架Auto-ART,将识别出的空白问题转化为可操作方案:包含50余种攻击方法、28个防御模块、鲁棒性诊断指数(RDI)及梯度掩蔽检测功能。该框架支持多范数评估(l1/l2/linf/语义/空间),并可映射至NIST AI RMF、OWASP LLM Top 10及欧盟AI法案等合规要求。在RobustBench上的实证验证表明:Auto-ART的预筛功能可识别92%标记案例中的梯度掩蔽,且RDI排序与完整AutoAttack高度相关。多范数评估显示,最先进模型的平均鲁棒性与最差情况鲁棒性间存在23.5个百分点的差距。现有研究尚未将此类结构化元科学分析与可执行评估框架相结合,以实现文献空白到工程实践的桥梁化。