Language model-based instruction-following systems have lately shown increasing performance on many benchmark tasks, demonstrating the capability of adapting to a broad variety of instructions. However, such systems are often not designed to be transparent about their limitations; a user may easily prompt a model with an instruction without any idea of whether the responses should be expected to be accurate, or if the system is even capable of performing the task. We propose a third party performance prediction framework, where a separate model is trained to predict the metric resulting from evaluating an instruction-following system on a task while assuming access only to its inputs and outputs at inference time. We perform this analysis with a variety of both open and closed instruction-following models as well as multiple performance predictors, and examine the effect of various factors such as model size, number of training tasks, and prompt format. Our findings indicate that third-party performance prediction is very challenging, and much work remains in developing predictors that can automatically reveal the limitations of modern instruction-following natural language processing systems.
翻译:基于语言模型的指令跟随系统近期在多项基准任务上表现出色,展现出适应各种指令的能力。然而,这类系统通常缺乏关于自身局限性的透明度;用户可能轻易向模型提供指令,却完全无法预判应答是否准确,甚至不清楚系统是否具备执行该任务的能力。本文提出一种第三方性能预测框架,通过训练独立模型来预测指令跟随系统在任务上的评估指标,推理时仅需访问系统的输入与输出。我们基于多种开源与闭源的指令跟随模型及多个性能预测器开展实验,探究模型规模、训练任务数量、提示格式等不同因素的影响。研究结果表明,第三方性能预测极具挑战性,要开发出能自动揭示现代指令跟随自然语言处理系统局限性的预测器,仍存在大量工作需要推进。