With the evergrowing sizes of pre-trained models (PTMs), it has been an emerging practice to only provide the inference APIs for users, namely model-as-a-service (MaaS) setting. To adapt PTMs with model parameters frozen, most current approaches focus on the input side, seeking for powerful prompts to stimulate models for correct answers. However, we argue that input-side adaptation could be arduous due to the lack of gradient signals and they usually require thousands of API queries, resulting in high computation and time costs. In light of this, we present Decoder Tuning (DecT), which in contrast optimizes task-specific decoder networks on the output side. Specifically, DecT first extracts prompt-stimulated output scores for initial predictions. On top of that, we train an additional decoder network on the output representations to incorporate posterior data knowledge. By gradient-based optimization, DecT can be trained within several seconds and requires only one PTM query per sample. Empirically, we conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $200\times$ speed-up.
翻译:随着预训练模型(PTMs)规模的持续增长,提供推理API(即模型即服务(MaaS)设置)已成为新兴实践。为适配参数冻结的PTMs,当前多数方法聚焦于输入侧,致力于挖掘强大的提示(prompt)以激发模型给出正确答案。然而,我们认为输入侧适配可能因缺乏梯度信号而困难重重,且通常需要数千次API查询,导致高昂的计算和时间成本。鉴于此,我们提出解码器调谐(DecT),该方法反其道而行,在输出侧优化任务特定的解码器网络。具体而言,DecT首先提取提示激发的输出分数以进行初始预测。在此基础上,我们训练一个额外的解码器网络以整合输出表示中的后验数据知识。通过基于梯度的优化,DecT可在数秒内完成训练,且每个样本仅需一次PTM查询。实验方面,我们进行了广泛的自然语言理解实验,结果表明,DecT以200倍的速度提升,显著优于当前最优算法。