This paper introduces RDA, a pioneering approach designed to address two primary deficiencies prevalent in previous endeavors aiming at stealing pre-trained encoders: (1) suboptimal performances attributed to biased optimization objectives, and (2) elevated query costs stemming from the end-to-end paradigm that necessitates querying the target encoder every epoch. Specifically, we initially Refine the representations of the target encoder for each training sample, thereby establishing a less biased optimization objective before the steal-training phase. This is accomplished via a sample-wise prototype, which consolidates the target encoder's representations for a given sample's various perspectives. Demanding exponentially fewer queries compared to the end-to-end approach, prototypes can be instantiated to guide subsequent query-free training. For more potent efficacy, we develop a multi-relational extraction loss that trains the surrogate encoder to Discriminate mismatched embedding-prototype pairs while Aligning those matched ones in terms of both amplitude and angle. In this way, the trained surrogate encoder achieves state-of-the-art results across the board in various downstream datasets with limited queries. Moreover, RDA is shown to be robust to multiple widely-used defenses.
翻译:本文提出RDA方法,旨在解决当前预训练编码器窃取工作中普遍存在的两个主要缺陷:(1)因优化目标有偏导致的次优性能;(2)端到端范式下每轮次均需查询目标编码器带来的高昂查询成本。具体而言,我们首先在窃取训练阶段前对每个训练样本精炼目标编码器的表征,从而建立偏差更小的优化目标。这是通过样本级原型实现的:该原型整合了目标编码器对给定样本不同视角的表征。相较于端到端方法,这种方法所需查询量呈指数级减少,而原型可被实例化以指导后续免查询训练。为获得更强效果,我们开发了多关系抽取损失函数,该函数使代理编码器在训练过程中判别嵌入-原型不匹配对,同时从幅值与角度两个维度对齐匹配对。通过此方式训练的代理编码器在多个下游数据集上以有限查询实现了全面最优结果。此外,RDA方法对多种常用防御手段具有鲁棒性。