Due to privacy or commercial constraints, large pre-trained language models (PLMs) are often offered as black-box APIs. Fine-tuning such models to downstream tasks is challenging because one can neither access the model's internal representations nor propagate gradients through it. This paper addresses these challenges by developing techniques for adapting PLMs with only API access. Building on recent work on soft prompt tuning, we develop methods to tune the soft prompts without requiring gradient computation. Further, we develop extensions that in addition to not requiring gradients also do not need to access any internal representation of the PLM beyond the input embeddings. Moreover, instead of learning a single prompt, our methods learn a distribution over prompts allowing us to quantify predictive uncertainty. Ours is the first work to consider uncertainty in prompts when only having API access to the PLM. Finally, through extensive experiments, we carefully vet the proposed methods and find them competitive with (and sometimes even improving on) gradient-based approaches with full access to the PLM.
翻译:由于隐私或商业限制,大型预训练语言模型(PLMs)通常以黑盒API的形式提供。将此类模型微调至下游任务具有挑战性,因为既无法访问模型的内部表示,也无法通过其传播梯度。本文通过开发仅需API访问即可适配PLM的技术来应对这些挑战。基于软提示调优的最新工作,我们开发了无需梯度计算即可调优软提示的方法。进一步地,我们提出了扩展方案,这些方案不仅无需梯度,还不需要访问PLM输入嵌入之外的任何内部表示。此外,我们的方法并非学习单一提示,而是学习提示的分布,从而能够量化预测不确定性。这是首项在仅通过API访问PLM时考虑提示不确定性的工作。最后,通过大量实验,我们仔细验证了所提出的方法,发现它们在性能上与具有完全PLM访问权限的基于梯度的方法不相上下(有时甚至更优)。