Toward Trustworthy AI: Multi-Target Adversarial Attacks and Robust Defenses for Continuous Data Summarization

Trustworthy AI requires reliable data-processing pipelines, not only robust downstream predictive models. As an upstream component, data summarization determines which information is retained and passed to subsequent learning or decision modules. Therefore, adversarial perturbations to the summarization process can compromise trustworthy AI in an upstream manner: they may alter the selected summary, reduce its representativeness, and further degrade the utility of subsequent learning tasks. In this paper, we study adversarial attacks on continuous data summarization under similarity-level perturbations through DR-submodular optimization. We show that a class of multi-resolution image summarization objectives can be formulated as multilinear extensions of non-negative submodular set functions and satisfy DR-submodularity with $m$-weak monotonicity. We then formulate multi-target attack generation as a min-max problem, where one admissible perturbation of the similarity structure is optimized to degrade multiple target summarization models. To mitigate such perturbations, we formulate robust defense against mixed attack types as a regularized max-min problem. For both problems, we develop approximation algorithms with theoretical guarantees. Experiments on real-data and controlled clustered benchmarks show that the proposed attack is effective in representative low-to-moderate budget regimes and can induce downstream task-performance loss. The proposed defense improves the robustness--mitigation trade-off in structured settings, while also revealing the parameter sensitivity of robust protection on real data.

翻译：可信人工智能不仅需要健壮的下游预测模型，还需要可靠的数据处理管道。作为上游组件，数据摘要决定了哪些信息被保留并传递给后续的学习或决策模块。因此，针对摘要过程的对抗扰动可能以上游方式损害可信人工智能：它们可能改变所选摘要、降低其代表性，并进一步削弱后续学习任务的有效性。本文基于DR-子模优化，研究了在相似性层面扰动下针对连续数据摘要的对抗攻击。我们证明了一类多分辨率图像摘要目标可以表示为非负子模集合函数的多线性扩展，并满足具有$m$-弱单调性的DR-子模性质。随后，我们将多目标攻击生成建模为一个极小极大问题，其中对相似性结构的一个允许扰动被优化以削弱多个目标摘要模型。为缓解此类扰动，我们将针对混合攻击类型的鲁棒防御建模为一个正则化极大极小问题。针对这两个问题，我们开发了具有理论保证的近似算法。在真实数据和受控聚类基准上的实验表明，所提出的攻击在代表性低至中等预算范围内有效，并能导致下游任务性能损失。所提出的防御改善了结构化环境中的鲁棒性与缓解效果间的权衡，同时也揭示了真实数据上鲁棒保护的参数敏感性。