S$^4$ST: A Strong, Self-transferable, faSt, and Simple Scale Transformation for Transferable Targeted Attack

Transferable Targeted Attacks (TTAs) face significant challenges due to severe overfitting to surrogate models. Recent breakthroughs heavily rely on large-scale training data of victim models, while data-free solutions, \textit{i.e.}, image transformation-involved gradient optimization, often depend on black-box feedback for method design and tuning. These dependencies violate black-box transfer settings and compromise threat evaluation fairness. In this paper, we propose two blind estimation measures, self-alignment and self-transferability, to analyze per-transformation effectiveness and cross-transformation correlations under strict black-box constraints. Our findings challenge conventional assumptions: (1) Attacking simple scaling transformations uniquely enhances targeted transferability, outperforming other basic transformations and rivaling leading complex methods; (2) Geometric and color transformations exhibit high internal redundancy despite weak inter-category correlations. These insights drive the design and tuning of S$^4$ST (Strong, Self-transferable, faSt, Simple Scale Transformation), which integrates dimensionally consistent scaling, complementary low-redundancy transformations, and block-wise operations. Extensive evaluations across diverse architectures, training distributions, and tasks show that S$^{4}$ST achieves state-of-the-art effectiveness-efficiency balance without data dependency. We reveal that scaling's effectiveness stems from visual data's multi-scale nature and ubiquitous scale augmentation during training, rendering such augmentation a double-edged sword. Further validations on medical imaging and face verification confirm the framework's strong generalization.

翻译：可迁移定向攻击（TTAs）因对替代模型的严重过拟合而面临重大挑战。近期突破性进展高度依赖受害模型的大规模训练数据，而无数据解决方案（即图像变换参与的梯度优化）通常依赖黑盒反馈进行方法设计与参数调整。这些依赖违背了黑盒迁移设定，损害了威胁评估的公平性。本文提出两种盲估计指标——自对齐性与自迁移性——在严格黑盒约束下分析逐变换有效性与跨变换关联性。研究发现挑战了传统假设：(1) 仅攻击简单尺度变换便能独特增强定向迁移性，其性能超越其他基础变换，甚至可与领先的复杂方法匹敌；(2) 几何变换与色彩变换虽在类别间关联性较弱，但内部存在高度冗余。基于上述发现，我们设计并调优了S$^4$ST（强鲁棒、自迁移、快速、简洁的尺度变换），该方法融合维度一致性缩放、互补低冗余变换及分块操作。在多种架构、训练分布与任务上的广泛评估表明，S$^4$ST无需数据依赖即可实现当前最优的有效性-效率平衡。我们揭示尺度变换的有效性源于视觉数据的多尺度本质及训练中广泛采用的尺度增广——这种增广实则是一把双刃剑。在医学影像与面部验证任务上的进一步验证确认了该框架的强泛化能力。